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The measurement of power spectra is a problem of steadily increasing im- 
portance, which appears to some to be primarily a problem in statistical esti- 
mation. Others may see it as a problem of instrumentation, recording and 
analysis which vitally involves the ideas of transmission theory. Actually, 
ideas and techniques from both fields are needed. When they are combined, 
they provide a basis for developing the insight necessary (i) to plan both the 
acquisition of adequate data and sound procedures for its reduction to mean- 
ingful estimates and (ii) to interpret these estimates correctly and usefully. 
This account attempts to provide and relate the necessary ideas and tech- 
niques in reasonable detail. Part II of this article will appear in the March 
issue of The Journal. 
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I. INTRODUCTION 

Communications systems and data-processing systems are generally 
required to handle a large variety of signals in the presence of noise. 
The design of these systems depends to a large extent upon the statisti- 
cal properties of both the signals and the noise. In most cases, the noises 
may be represented, or approximated, as stationary Gaussian random 
processes with zero averages, so that all of their relevant statistical prop- 
erties will be contained by the autocovariance function or the power 
spectrum. In many cases, the signals may also be represented, or ap- 
proximated, as stationary Gaussian random processes with zero averages. 

Noises, signals, or other ensembles of functions (given continuously or 
at intervals) which are approximately stationary but not Gaussian are 
often also usefully studied in terms of autocovariance functions or power 
spectra. Although the average and the spectrum are no longer the only 
relevant statistical properties, they are usually the most useful ones. 
Thus, we shall do well to keep as much of our treatment generally appli- 
cable as possible. 

In almost every case, the autocovariance function or power spectrum 
of either the noise or the signal will be of interest and importance. 

To determine the autocovariance function or power spectrum of an 
(approximately) stationary random process, we are often reduced to 
the necessity of measurement and computation. Exact determination 
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would require a perfectly-measured, infinitely-long piece of a random 
function (or a collection of pieces of infinite total length), and would re- 
quire infinitely detailed computations. Both of these requirements are, 
of course, impractical. Approximate determination, on the other hand, 
raises the questions of how much data of a given accuracy will be re- 
quired, what computational approach should be used, and how much re- 
liance may be placed upon the results. Practically useful answers to 
these questions may be found by combining results from transmission 
theory and the theory of statistical estimation. These answers prove to 
be relatively simple. The only major difficulty in their practical applica- 
tion is the extensiveness of the data required for highly precise estimates. 
This requirement is an inherent, irrevocable characteristic of such ran- 
dom processes. 

In this account we shall treat only the measurement of spectra of in- 
dividual noises or signals. The measurement and utilization of the cross- 
spectra of pairs of series is also important, but is beyond our present 
scope. Questions of distribution and anticipated variability of cross- 
spectral estimates, and of certain estimates derived from them, have re- 
cently been cleared up by Goodman. 1 

It is natural to feel that the measurement of power spectra is simple, 
and that no problems deserving extended discussion arise. After all, are 
there not commercial "wave analyzers" of many sorts; have not Fourier 
series served for many years to analyze the frequencies of many signals, 
(musical instruments, human voices, etc.)? Why should there be a serious 
problem? 

There are two reasons why elementary methods fail us rather fre- 
quently. On the one hand, the signal may not be available in indefinitely 
long time stretches. Either the conditions of observation, experimental 
or otherwise, or the difficulties of careful recording may make it imprac- 
tical to have so much data that we can afford to analyze carelessly. (The 
examples of Sections 26 to 28, involving spectra of radar tracking, 
noise in very short-lived devices, and irregularities in the earth's rota- 
tion, respectively, all illustrate this point). Even if observation and 
recording can be afforded, the cost of computation often forces careful 
analysis. 

On the other hand, the random nature of much noise, and some sig- 
nals, in which the relative amplitudes and phases of different frequencies 
are not stably related (in contrast to voices and musical notes), intro- 
duces much more difficulty with sampling fluctuations and provides 
much more significant appearing, thus much more misleading statistical 
artefacts than experience with simpler signals would lead investigators 



MEASUREMENT OF POWER SPECTRA 189 

to expect. (In postwar oceanography, for example, high mechanical in- 
genuity was expended in the construction of simple and effective wave 
analysers to produce detailed spectra of ocean waves. The results were 
quite misleading, because the frequency resolution obtained was too high 
for the limited length of records used, and almost the entire appearance 
of the resulting spectra was an illusion due to the particular fluctuations 
of the particular record. The use of broader filters has since led to mean- 
ingful results which could be related to physically satisfying theories.) 
All too often, the practical study of spectra requires care. 

Effective measurement of power spectra requires understanding of a 
number of considerations and action guided by all of them. Explaining 
each individual consideration is necessary, but it is equally necessary to 
explain how they fit together. The general structure of this description 
of spectral measurement is the following: an introduction to the concepts 
(Sections 1-3), brief accounts of individual considerations (Sections 
4-19), accounts of how these considerations are assembled in analysis 
(Sections 20-21), and planning for measurement (Sections 22-28, 
which include discussion of examples), and Sections in Part II giving 
the details supporting the earlier sections. 

We have attempted to provide, somewhere, most of the facts and atti- 
tudes that arc needed in the practical analysis of (single) power spectra. 
Readers interested in either completing their present knowledge or in 
gaining a brief overview of the subject may wish to proceed next to Sec- 
tions 20rT, whence they can be referred to specific sections of interest. 
For some, reading of Sections 1-3 may be a helpful preliminary for Sec- 
tions 20ff. For others, who want to build more solidly as they go, reading 
straight through, perhaps with considerable cross-reference to Part II, 
may be best. 

A function of time X(t) generated by a random (or stochastic) process 
is one of an ensemble of random functions which might be generated by 
the process. The value of the function at any particular point in time is 
thus a random variable with a probability distribution induced by the 
ensemble. Furthermore, the values of the function at any particular set 
of points, say t = U , (■• , • • • , /„ , have an rc-dimensional joint probability 
distribution also induced by the ensemble. Such probability distributions 
have an important bearing on the design of any communication system 
or data-processing system which must handle an output from such a 
random process, be this output "signal" or "noise". 

We shall often, but not always, assume that the random process is 
Gaussian. This means that, for every n, h t h, • ' ' > t» , the joint proba- 
bility distribution of 
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X{k),X(t 2 ), ...,!(«, 

is an n-dimensional Gaussian or normal distribution. Each such distribu- 
tion is completely determined by the ensemble averages 

X(U) = ave [X(U)}, 

and by the covariances 

C ij =COv{X(t i ),X(t j )} 

= ave [[X(U) - X(td]-[X(tj) - *(«,)]) . 

As a matter of convenience in development we will assume that the 
averages X(ti) are zero. The covariances then reduce to 

dj= ave iX(ti)-X(tj)}. 

Throughout, we will assume that the random process is stationary (that 
is, temporally homogeneous) in the sense that it is unaffected by trans- 
lations of the origin for time. The covariances C</ now depend only on 
the time separation U — t } - so that 

Co- = C{U - tj). 

Thus, the noise is completely specified by a single function of a single 
variable. In particular, C(0) is the variance (for zero average, the average 
square) of X(t). 

If the process were stationary, with zero averages, but were not 
Gaussian, then knowledge of the covariance as a function of lag, although 
providing a very large amount of useful information, would not com- 
pletely specify the process. The results of this paper fall into two cate- 
gories: (i) those relating to average values of spectral estimates, and 
(ii) those relating to variability of spectral estimates. The average- value 
results apply generally under the assumptions of stationarity (and zero 
averages), and do not depend upon the Gaussian assumption. The varia- 
bility results are exact under the Gaussian assumption, and are usually 
rather good approximations otherwise. Thus, our results have practical 
value for noises and signals which are not closely Gaussian. 

Results about variability are naturally used: (i) for planning the ap- 
proximate extent of measurement effort, (ii) for indicating the presence 
of changes, during a series of measurements, in the quantities estimated, 
and (hi) as a means of judging the precision of an over-all estimate. The 
results given here are mainly for the first planning use. The additional 
uncertainties in actual variability due to either non-normality of distri- 
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button, or to changing of conditions between runs, or to both, are often 
all too real, but are rarely large enough to affect planning seriously. The 
same is true of mild nonstationarity. The Gaussian, stationary results 
can also be applied to the second use, the detection of changes in the true 
spectrum, but considerable caution is in order. The precision of final 
over-all values is ordinarily far more wisely judged from the observed 
consistency of repeated measurements (as by analysis of variance of 
logarithms of various spectral density estimates at the same nominal fre- 
quency) than from any theoretical variability based on a Gaussian as- 
sumption. 

Communications engineers are more accustomed to work with a single 
time function of infinite extent than with an ensemble of finite pieces (of 
such functions). It is perhaps fortunate, therefore, that averages across 
an ensemble are equivalent (ergodicity) to averages over time along a 
single function of infinite extent, whenever a process is stationary, 
Gaussian, has zero averages, and has a continuous power spectrum (no 
"lines"). (If the process were not stationary the single function approach 
could not be used in this way.) 

Since we seek to make this account as intuitive as possible for com- 
munications engineers, we shall define transforms, and make many other 
computations in terms of averages along a single function (as limits of 
integrals over centered intervals). In dealing with more specifically sta- 
tistical issues, however, we shall write "ave" for average value, "var" 
for variance and "cov" for covariance, and shall do nothing to hinder 
the interpretation of these operators as acting across the ensemble. 
(Those who wish can also think of them in single function terms.) 

The covariance at lag r, in single function terms, is given by 

C(t) = limi f X(t)-X(t+ r)-dt 

T-*ao 1 J—Tll 

In ensemble terms, we would write merely 

C(r) = ave [X(jt)-X(t + r)}. 

The function C(t) is frequently called the autocorrelation function, 
although historical usage in both statistics and the theory of turbulence 
(Taylor 2 ) shows that this name should be applied to the (normalized) 
ratio C(t)/C(0). We shall call C(t) the autocovariancc function. This 
name is appropriate to our formal definition of C(t) because we have 
assumed that the averages of our process are all zero. Whenever we give 
up the assumption of zero averages, as we must almost always do when 
dealing with actual data, we shall use 
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ave {X(t)-X(t + r)} = average lagged product, 

ave J [X{t) — X] ■ [X(t + t) — X\ } = autoco variance function, 

where X is the common value of ave \X(t)} and ave {X(t + r)}, thus 
preserving accurate usage. 

Because of the direct relationship of the joint probability distribution 
to the autocovariance function, much of the statistical attention given 
to Gaussian stationary time series (time-sampled random functions) has 
been expressed in terms of serial-correlation coefficients (corresponding 
to lag-sampled autocovariance functions). 

A stationary Gaussian random process may be regarded (e.g. Rice 3 ) as 
the result of passing white Gaussian noise through a fixed linear network 
with a prescribed transmission function. White Gaussian noise, in turn, 
may be regarded as the superposition of the outputs of a set of simple 
harmonic oscillators (continuously infinite in number) with 

(a) a continuous distribution in frequency,* 

(b) uniform amplitude over the significant frequency range of the 
transmission system, and 

(c) independent and random phases. 

This point of view is particularly suited to the techniques employed by 
communications engineers. It is therefore not surprising that communi- 
cations engineers have dealt with stationary Gaussian random processes 
almost entirely in terms of power spectra. 

Because the autocovariance function and the power spectrum are 
Fourier transforms of each other, it would at first appear to be purely 
a matter of convenience which one is used in any particular situation. 
Indeed, optimum filter characteristics for the protection of signal against 
noise in communications systems and in many types of computing de- 
vices have, on occasion, been determined by the use of the autocovari- 
ance function. In practice, however, the filter designer almost invariably 
turns to the power spectrum as the final criterion of adequate design 
and performance. 

In practice also, where the autocovariance function or the power spec- 
trum must be determined by measurement and computation, and then 
interpreted, the choice is now heavily weighted in favor of interpretation 
of the power spectrum. Although a great deal of theoretical work has 
been done on the probability distribution of the serial-correlation co- 
efficients for Gaussian stationary time series of finite length, with a view 
to the estimation of the confidence which may be placed upon practical 

* The term "frequency" is used throughout this paper in the communications 
engineer's sense, viz., C3'cles per second of a sinusoidal wave. (Exceptional uses 
in the statistician's sense are explicitly noted.) 
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results, the criteria which have been developed along this line are so 
complicated that it is extremely difficult to apply them in practice, 
where the joint distribution must be considered. On the other hand, the 
situation with respect to the power spectrum is now very satisfactory 
for practical purposes. This stems from results obtained by Tukey, 4 
and in part independently by Bartlett, 5 about nine years ago, when 
studies were made of the effects of sampling, of finite length of series, 
and of choice of computational procedure on the behavior of the esti- 
mated power spectrum. Since that time, applications to such diverse 
fields as ocean waves (Marks and Pierson 6 ), aerodynamics (Press and 
Houbolt 7 ), meteorology (Panofsky 8 ), and seismology (Wadsworth, 
Robinson, Bryan, and Hurley 9 ), have shown the practical applicability 
of these results to a wide variety of physical time series. 

Shortly after these studies first reached the stage of practical useful- 
ness, the theoretical analysis was reformulated by Blackman, who ex- 
pressed it from the point of view of transmission theory, for presentation 
to members of the technical staff of Bell Telephone Laboratories 
(Out-of-Hours Courses 1950-1951, Communications Development 
Training Program 1950-1952). 

More recent contributions (1950-1957) to the theory of power spec- 
trum estimation have been reviewed by Bartlett and Medhi, by Bart- 
lett, 11 and by Grenander and Rosenblatt. 

2. AUTOCOVARIANCE FUNCTIONS AND POWER SPECTRA 

First, let us consider the ideal case. The autocovariance function which 
was defined in the preceding section by 

C(t) = lim^ f X(t)-X(t + r)-dt 

may be reduced to the form 

C(t) = f M P(f)-e i2 ' fT df 

where 



P(f) = lim ^ 



f ' X(t)-e- i2 *" dt 

J-rl2 



(cp. Section B.2). The function of frequency P(f) describes the power 
spectrum of the stationary random process considered. More precisely 
P(f) df represents the contribution to the variance from frequencies be- 
tween/and (/ + df). If we think of X(t) as a voltage across (or current 
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through) a pure resistance of one ohm, the long-time average power dis- 
sipated in the resistance will be strictly proportional to the variance of 
X{t). This important special case is the excuse for the adjective "power''. 
The pure statistician might prefer to refer to the covariance spectrum or 
to the second moment spectrum rather than to the power spectrum. 
For precision, we shall often refer to P(f) as the spectral density or 
■power spectral density. When no confusion is likely, we may call P(/) 
merely the power spectrum. 

The relation exhibiting the autocovariance function as the Fourier 
transform of the power spectrum may be inverted to express the power 
spectrum as the Fourier transform of the autocovariance function. Thus, 
we have 



P(f) = f C(r)-e 

J— CO 



,-t2x/ T 



dr 



The autocovariance function C(t) and the power spectrum P(f) are, 
formally at least, even functions of their respective arguments. Hence, 
the relation between them may be expressed more simply as two-sided 
cosine transforms, viz. 

C(t) = [ P(/)-cos2*/r-d/, 

•/-co 



and 



P(f) = f C(r) -cos 2tt/t-(/t; 

J— oo 

or perhaps even more simply, as one-sided cosine transforms, viz. 

C(t) = 2 f P(/)-cos2x/r-d/ 
Jn 



and 



P(f) = 2 [ C(r) -cos 2x/r-dr. 
Jo 



Results are usually more conveniently developed in terms of the two- 
sided forms than in terms of the one-sided forms. In Sections A.3 and 
B.4 for example, the use of the two-sided forms with exponential kernels 
will be found to simplify considerably the expression of the operation of 
convolution between functions of lag or of frequency. In Section B.6, the 
use of the two-sided forms with exponential kernels avoids some compli- 
cated manipulations of trigonometric identities in the early stages of the 
development. 
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It should be particularly noted that 

ave {X(t)-X(t + T )} = [ 2P(f) • cos 2tt/t • df 

and that (setting r = 0) 

var [X(t)\ = f 2P{f)-df. 
Jo 

Thus, it is evident that our definition of the power spectrum differs from 
the usual one which associates the power spectrum only with positive 
frequencies. References to the power spectrum in practice are usually 
in terms of a density 2P(/) over ^ / < °° only. 

3. THE PRACTICAL SITUATION 

In practice we can obtain only a limited number of pieces of X(t) 
of finite length. Each piece may be regarded as a sample drawn from a 
population or universe of pieces of X(t) of the same length. The reduc- 
tion of the data will therefore yield no more than estimates of the auto- 
covariance function and of the power spectrum — estimates which are 
subject to sampling variations and to biases in the usual statistical sense. 
This situation is further complicated in those cases in which we can 
measure, or desire to use, only values of X(t) at uniformly spaced values 
of t within each piece of X(t); in other words, those cases in which we 
are dealing with classical time series (discrete time) rather than with 
time functions (continuous time). 

The theoretical study of sampling variability and bias is much simpler 
in the case of the estimates of the power spectrum than in the case of 
the estimates of the autocovariance function (or of serial-correlation co- 
efficients). This reflects the fact that, as we consider longer and longer 
records, and two narrow frequency bands with an arbitrarily small but 
fixed separation, we may find estimates of the power in these frequency 
bands which both (i) become arbitrarily precise, and (ii) become arbi- 
trarily nearly (statistically) independent. The existence of such esti- 
mates is another particular consequence of the Gaussian character, as 
expressible in terms of "random and independent phases", of the ran- 
dom process from which we have one or more samples. 

Use of the power spectrum has an additional advantage over use of 
the autocovariance function. In almost all practical situations, the data 
analyzed does not represent the actual output of the random process. 
In such cases the data will have been modified, appreciably if not radi- 
cally, by the transmission characteristics of the devices employed to se- 
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cure the data. This modification of the data may in fact be intentional, 
as we shall see when we come to the discussion of "pre whitening" in 
Section 15. In any case, the estimates will have to be corrected for the 
effects of this modification of the data. For estimates of the power spec- 
trum, the correction procedure is a simple division of a frequency func- 
tion by another frequency function. For estimates of the au toco variance 
function, however, the correction procedure will require a Fourier trans- 
formation, division of the resulting frequency function by another 
frequency function, and an inverse Fourier transformation. This whole 
sequence of operations on the autocovariance function is the only prac- 
tical procedure for the inversion of the convolution (see Appendix A.3) 
which is the effect to be corrected for. (Details are given at the end of 
Section B.3.) 

As we shall see, the measurements and computational operations may 
involve the use of either analog or digital computation and handling of 
either continuous "signals" or discrete data. (Whatever be its rela- 
tion to some communication or data-handling system, we shall call con- 
tinuous-time signals or noise which we are analyzing "signals", while dis- 
crete-time signals or noise, or discrete-time samples thereof, will be 
called data.) In actual practice, and for well-defined reasons of in- 
strumentation and computation engineering, only a few of the many 
possible combinations are used. 

Spectrum analysis by analog computation is almost always applied 
to continuous "signals", and makes use of filtering rather than going 
through autocovariance or mean lagged products. Digital computation 
must be carried out on discrete data, perhaps time-sampled from a 
continuous "signal", and preferably uses an indirect route via mean lagged 
products rather than trying to isolate individual frequency bands di- 
rectly. In either case, each data value must enter several computations, 
and it is rarely economic to carry these computations out directly in real 
time, especially since there will not usually be enough such analysis on a 
regular basis to saturate the working capacity of the analog or digital 
computer used. Consequently, recording, either of "signals" or of data or 
of both, is almost inevitable. 

Thus, five stages will be important in nearly every case: 

(1) sensing (pick-up, conversion, etc.) 

(2) transmission (to recorder or, possibly, to computer) 

(3) recording (including play-back, and, perhaps, time-sampling) 

(4) computation (formulas, computing circuit performance, etc.) 

(5) interpretation. 

In every one of these stages, quality of performance (noise level, dis- 
tortion, etc.) will be of importance. 
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The present account concentrates on the computational and inter- 
pretational stages, but indicates, from time to time, those considerations 
in the other stages which are peculiar to power spectrum analysis. 

We have been unable to find a wholly satisfactory arrangement for 
the material we wish to present. In order to facilitate a relatively easy 
once-over, these introductory sections now continue into a condensed 
account, from which proofs, some reasons, and many helpful remarks 
have been postponed to the Appendix and sections in Part II. Readers in- 
terested in a survey may find it adequate to read only the condensed 
account. Others may find it best to skim this condensed account first, to 
read Appendix A next, and then to study similarly numbered sections of 
Part II and the condensed account. 

The continuous record of finite length will be treated first (Sections 
4-11); the modifications required for the discrete equally spaced record 
are covered next (Sections 12-21), and the opening account concludes 
with a discussion of the planning and analysis of measurement programs 
(Sections 22-28). 

Appendix A (Sections A.l to A.6) treats fundamental Fourier tech- 
niques, and the transform-pairs most closely associated with diffraction, 
in both the continuous and equi-spaced cases. 

Each section of Part II relates to the similarly numbered section of 
the main body, and contains details of derivations, further reasons, and 
additional helpful remarks. 

Definitions of the technical terms, arranged alphabetically for ref- 
erence, are included at the end of Part I. Similar definitions of the nota- 
tion will be given at the end of Part II. 

Continuous Records of Finite Length 

4. fundamentals 

Given a continuous record of finite length, it is clear that we cannot 
estimate the autocovariance function C(t) for arbitrarily long lags. 
Surely, no estimate can be made for lags longer than the record. Fur- 
thermore, as we will find in due course, it is usually not desirable to use 
lags longer than a moderate fraction (perhaps 5 or 10 per cent) of the 
length of the record. Thus, in place of 

C(t) = limjL [ X(t)-X(t + T)-(U 

for all values of r, we will have at our disposal 

CM = „ 1 | ■ [ ' X (t - iVx (t + I).* = Coo(-r) 
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only for | t | ^ T m < T n , where T n is the length of the record, and T m 
is the maximum lag which we desire to use. We will call Coo(t) the ap- 
parent autocovariance function, since (on account of ergodicity) its 
average value is C(t) for \ t \ ^ T m . 

The class of estimates for the power spectrum with which we are chiefly 
concerned will be derived from a modified apparent autocovariance Junction 
by Fourier transformation. While the modified apparent autocovariance 
functions, which are obtained by multiplying the apparent autocovari- 
ance function by suitable even functions of t, are often far from being 
respectable estimates of the true autocovariance function, their trans- 
forms are very respectable estimates of smoothed values of the true spec- 
tral density. 

Let Di(r) be a prescribed even function of r, subject to the restrictions 
7),(0) = 1, and D,(r) = for \t\> T m , (where i = 0, 1, 2, 3, 4, de- 
pending upon the shape of D,(t) for | r | < T m ), and let the correspond- 
ing modified apparent autocovariance function be defined by 

CM = ZMr)-Coo(r). 

We may regard D,(t) as a window of variable transmission which modi- 
fies the values of C 00 (t) differently for different lags. It is therefore 
natural to call /)»(r) a lag window. 

For any lag window which meets the conditions stated above, Ci(r) 
is calculable from the data. Further, it is clear that C { (t) = 
for | r | > T m although C 00 (r) was not defined there. Because Ci(r) 
is defined for all values of t, it has a perfectly definite Fourier transform 
Pi(f), which should satisfy the symbolic relation, 

Piif) = Qi(J) * Poo(f) 

where Q,(/) is the Fourier transform of Di(r), the asterisk indicates con- 
volution (see Appendix A.3 for discussion), and Poo(f) is the Fourier 
transform of Coo(r). However, Poo(/) is not determinate because Coo(t) 
is not specified for | t | > T m (and its definition cannot be directly ex- 
tended beyond J r | = T n ). Nevertheless, since 

ave {C,(r)} = !>,(')• C(r) 

where C(t) is the true autocovariance function, it follows that 

ave {?<(/)} =Qi(f)*P(f) 

where P(f) is the true power spectrum, that is, the Fourier transform 
of C(t). The average may be thought of as either across the ensemble, or 
along time. (The latter type of averaging would correspond to replacing 
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X(t) by X(t — X), thus changing the stretch of X(t) which is observed, 
and then averaging over X.) The corresponding explicit relation, viz. 

ave{Pi(/i)) = f Qi(fi-f)-P(f)df 

J— 00 

exhibits the average value of P,(/i) as a smoothing (average-over-fre- 
quency) of the true power spectrum density P(f) over frequencies "near" 
/i with weights proportional to Q,(/i — /). In a manner of speaking P,-(/i) 
is the collected impression of the true power spectrum P(/) obtained 
through a window of variable transmission Q,-(/i — /). It is therefore 
natural to call Qi(f) the spectral window corresponding to the lag window 
D<(t). 

The form just given for ave {Pi(fi)\ is natural for our two-sided defi- 
nition of power spectra, but, in order to view the result from the stand- 
point of transmission theory for real-valued signals, it is convenient to 
express the result in a form appropriate to a one-sided definition of power 
spectra. Taking advantage of the fact that Q»(/) and P(f) are even func- 
tions, we may write 

ave{2P 1 (/ 1 )| = r H i (f;f 1 )-2P(f)-df 
Jo 

where 

ff,(/;/i) = Qi(f + Si) + QiU - /0 

and where we recall that 2P(/) df is the amount of power between / 
and (/ + df) in the one-sided true power spectrum. Similarly, 2P,(/) df 
is the amount of power between / and (/ + df) in the one-sided estimated 
power spectrum. The function #,(/; /i) has one of the necessary proper- 
ties of a physically realizable power transfer function inasmuch as it is 
an even function of / as well as of f\ . In general, however, it does not 
have the property of being non-negative at all frequencies /. Neverthe- 
less, it is a convenient function to use in the analysis of the variability 
of the estimated power spectrum. It will be convenient to regard the 
average value of the smoothed power density estimate ave J2P,(/i)} 
as the result of passing the true power spectrum, through a "network" 
with power transfer function //,-(/; .A). 

We see that our procedures will lead us to estimates whose average 
values are a smoothing (average-over-frequency) of the true power spec- 
tral density P(f) over frequencies "near" f x , and not to estimates of 
P(/i) itself. The problem of choosing the shape of the lag window Di(r) 
so that its Fourier transform Qt(f) will be concentrated near / = is 
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almost identical to the problem of choosing an intensity distribution 
along an antenna so that most of the radiation from the antenna will 
fall in a narrow beam. From this analogy we will use such terms as main 
lobe and side lobes for the principal maximum and subsidiary extrema 
of Q ,-(/). (Indeed, any attempt to confine the power transfer function to 
too narrow a frequency band — too narrow in comparison with the re- 
ciprocal of the longest lag used — would be analogous to an attempt to 
construct a practical hyperdirective antenna.) 

It is not surprising that we are led to estimate a smoothed power spec- 
trum. With only a finite length of X{t) available, we should not expect 
to be able to identify frequencies exactly, and are, indeed, unable to do 
so. (The presence of neighboring frequencies with random phases will 
have effects similar to those of noise in preventing such identification.) 

5. TWO PARTICULAR WINDOW PAIRS 

In order to specify a particular family of estimates within the class 
of estimates defined in the preceding section, we have only to specify 
Di(r) or Qi(f). We would like to concentrate the main lobe of Q,(/) 
near / = 0, keeping the side lobes as low as feasible. In order to concen- 
trate the main lobe we have to make Di(r) flat and rather blocky. In 
order to reduce the side lobes, however, we have to make D,(r) smooth 
and gently changing. Since Z>,(t) must vanish for \ t\ > T m we must 
compromise. So far, cut-and-try inquiry has been more powerful in find- 
ing good compromises than has any particular theory. 

A simple and convenient compromise is represented by the lag win- 
dow (whose use is called "hanning") 

D,{r) = \ (l + cos ^;) for | r | < T m 

= f or | r | > T m . 

(Window pairs and 1 are discussed in Section B.5.) An alternative 
compromise is represented by the lag window (whose use is called "ham- 
ming") 

D 3 (t) = 0.54 + 0.46 cos ^ for \r\ <T m 

= for | t | > T„ . 

These lag windows and the corresponding spectral windows are illus- 
trated in Fig. 1. Notice that the main lobes are four times as wide as the 
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Fig. 1 — Lag windows D 2 and D 3 . Spectral windows Q2 and Q 3 . 

side lobes (excepting the split side lobes nearest the main lobes), and 
that the (normal) side lobe width is l/(2T m ). 

The general nature of the spectral windows in these two pairs is the 
same: a main lobe, side lobes at most 1 per cent or 2 per cent of the 
height of the main lobe. There are differences, which are sometimes rele- 
vant, but these may not be obvious. The two most important of these 
differences are : 

(a) The highest side lobe for the "hamming" (spectral) window is about 
£ the height of the highest side lobe for the "hanning" window, 

(b) The heights of the side lobes for the "hanning" window fall off 
more rapidly than do those for the "hamming" window. 

One difference favors one pair, and one the other. 

These and several other window pairs are discussed in Section B.5. 



6. CO VARIABILITY OF ESTIMATES — BASIC RESULT 

It is shown in Section B.6 that, strictly only under Gaussian cir- 
cumstances, the covariance of any two power density estimates of the 
sort we have been considering is given to a good degree of approxima- 
tion by 
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cov [2P i (f 1 ),2P j (f 2 )} pa f HM\to-HM\M>2TU)-4S 

Jo 

where the power-variance spectrum T(f) depends only on the true power 
spectrum P(f) and the effective record length T n , as described below. 
Thus, we may regard the covariance of the two power density estimates 
as the result of passing the power variance spectrum T(f) through two 
networks in tandem, one with power transfer function //,(/; /i), the other 
with power transfer function Hj(f; / 2 ). In other words, we may regard 
the covariance (of the estimates of the power spectrum) as the power 
remaining from the power-variance spectrum T(f) after passing through 
the two windows Hi(f; /i) and Hj(f; / 2 ) associated with the estimates 
themselves. 7/ the windows do not overlap, the estimates do not covary 
(at least not in terms of second moments). 
In particular, of course, 

var {2P,(/i)l = cov \2P i (f 1 ), 2P i (f 1 )} 

« C H i U\fi?-2YU)-df 

Jo 

to which we can give a similar interpretation. 

These results would become exact if we were to replace C o(r) by 

where | r | ^ T m < T n .In Coo(r) we averaged X(t - (r/2))-X(t + (r/2)) 
over an interval of t of length T„ — \t\, varying with t. In C 00 (r) we 
would be averaging X(t — (r/2))-X(t + (r/2)) over an interval of t of 
length T n — T m independent of r. We could actually do this because 
| t ± (t/2) I ^ T n /2 for | t | ^ T m . However, for values of | r J less 
than T m , Coq(t) would not make use of some values of X(t — (t/2))- 
X(t + ( T /2)) which are used in C o(t). Thus, C o(t) would be wasteful. 
It seems best, therefore, to use C o(t) for computation, but to approxi- 
mate its variability by the variability corresponding to a Coo(t) which 
could not be calculated from the actual values. This "approximate" 
hypothetical Coo(t) involves a fixed range of integration T n part way 
between T n — T m and T n . The situation is illustrated in Fig. 2, where 
the ranges of integration are shown for the actually "feasible" C o(r), 
for the Coo(t) which "wastes not", and for the Coo(t) which we use to 
"approximate" Coo(r). The shaded areas delineate the products which 
are actually available. 
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The best choice of an intermediate value depends somewhat upon the 
7),(t) and Dj(t) involved, and is discussed in Section B.6. In practically 
useful cases we may take 

rp' W im 

■* n ■* n 3 -* m • 

The power- variance spectrum is given approximately and closely by 
r(/) = 4 £ P(f + /') -P(f - f')-(^^) 2 df («' = 2rf). 

If we have p pieces of total length T n , and if, in computing our estimate 
of C(t) for each t, we combine all available lagged products 

X(t- (r/2)).X(«+(r/2)) 

without regard to which piece they came from, then we may use this for- 
mula for T(f) with 

rp' rp V rp 

In ~ In 3 l,n 

as a satisfactory approximation for the effective total length. 




Fig. 2 — Range of integration over t as a function of t in CooO"), Coo(r), and 
C!n(r). 
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7. COVARL\BILITY OF ESTIMATES — APPROXIMATE FORMS 

In assessing the covariability of estimates of the smoothed power spec- 
trum, the relative magnitudes of three distances along the frequency 
axis are important: 

(a) the distance 1/T» , the reciprocal of the effective length of record, 

(b) the least distance over which P(f) changes by an important amount 
for / near fi , and 

(c) the least distance over which #,(/; fi) changes by an important 
amount for / near / x (this is of the order of 1/T m and is usually much 
larger than 1/K). 

If P(f) changes slowly enough to make (b) larger than (a), we may 
use the approximation 

r(/)^|rtW)] 2 

■* n 

whence, approximately, 

COV {P,(/0, Py(/ 2 ) } « A r Pil(f)'PAf) -df 

1 n Jo 

where 

P*(f) = HtflttPU) 

Pdf) = Hi{f;f 2 )P(f). 
In the same terms we have 



and 



ave {Piifi)) = f PM)-df 
Jo 

ave {P,(/0) = f° PflW-df. 

Jo 



The relation of covariances to averages thus established may be rea- 
sonably interpreted as meaning that any cancellations occurring in the 
average values also occur in the covariances and variances. To the ac- 
curacy of this approximation, then, we appear to be using the data rather 
efficiently. 

If, on the other hand, the true spectrum, P(f), consists of a single 
sharp peak at / = /o , we may use the approximation, derived in Sec- 
tion B.7, namely 



GOV {P,(/i),P,(A)} 



[f^w-4r]{fpiiW-4r] 

rcave {P<(/i)}-ave (P>(/ 2 )}, 
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a result which is not influenced by T' n (so long as T' n does not become 
large enough for l/7''„ to become comparable with the width of the peak). 

8. VARIABILITY — EQUIVALENT WIDTHS 

If P(f) changes slowly in comparison with 1/T» , then, since 

var (P,(/i)| = covtfmWi)}, 
we may write down the dimensionless variability of P,(/i) itself as 
var {P,-(/i)l 1 

[ave [PiifM* r u w.* 

where 



W„ = = 



PnW-df 





[P«(f)] 2 -df 

is naturally called the equivalent width of Pn(f) = Hi(f; fi)-P(f). 

The longer the record, and the wider the equivalent width, the more 
stable the estimate. (Increasing the width also of course makes the esti- 
mate refer to an average power density over a wider frequency interval.) 

If, on the other hand, P(f) consists of a sharp peak, then, by the con- 
cluding remarks of the preceding section 

var {Pi(/i)j = 



[ave {P,(/i)}] 2 

The equivalent widths of some simple cases are as follows: 

1. If Pn(f) is a rectangle of width W which does not cross / = 0, 
then W e = W. 

2. If Pn(f) is a triangle of base W which does not cross/ = 0, vertex 
anywhere over the base, then W c = 0.75 W. 

3. If Pii(j) is proportional to 



sin 


O) + o»i 




sin 


CO — COl 


w 




w 






+ 






0) 


w 




CO 


— COl 

w 



i.e. has the shape of i/ (/;/i), where W = W maiD = 2W Bldo (these being 
the widths of main and side lobes, respectively), and if /i ^ \/T m then 

W e = 0.5 W = 0.5 IF m ain = H'^. 

4. If Pad) has the shape of // 2 (/;/i), i-e. is proportional to a hanning 
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(0.25, 0.5, 0.25) window, and if /i ^ 1/T m , then W e = 0.67 W maia = 2.67 

5. If Pn(f) has the shape of H 3 (f; fi), i.e. is proportional to a hamming 
(0.23, 0.54, 0.23) window, and if /i ^ 1/T m , then PT e = 0.63 W mailt = 

2.52 I7 Bi de • 

These cases are illustrated in Fig. 3, a single sketch sufficing for the 
last two. Note that W e is close to fPTmaia for practical windows, if 

A ^ l/r. • 

For our standard window pairs, hanning or hamming, the width of 
the normal side lobes is l/(2T m ) and, consequently, W e ~ 1.30/T m , 
if fx * 1/T m . 

These last three equivalent widths decrease somewhat as /i becomes 
small, and the values given should be halved for /i = 0. 

If P(f) varies linearly across #,(/; /i), then a calculation discussed 
in Section B.8 shows that W e will tend to fall in the range from I.15/T m 
to 1.23/ T m . (A rather peaked case gives 0.94/7 1 m .) When we allow for 
the fact that we are likely to be concerned with processes which are not 
quite Gaussian, whose variances of estimate are consequently likely 
to be somewhat larger than for the Gaussian case, a change correspond- 
ing to the use of a decreased equivalent width in the formula, the choice 

We ~r m 

which introduces a small factor of safety (not more than 1.3) seems de- 



w» » 




I*— W e *| 





U— W e — -»l 

Fig. 3 — Equivalent widths of some spectral windows. 
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sirable for planning purposes. Consequently, we shall plan according to 

var{Pi(/i)i „T m 
[ave{P.(/x)}?^n" 

If we plan to hold the RMS deviation of each of our estimates below 
one-third of its average value, we must, accordingly, keep T m /T„ below 
ji. Thus, as noted above, we shall ordinarily keep T m to a small fraction 
of T„ . 

In making more detailed studies of the variability of spectral esti- 
mates, further approximation will be convenient. It is important to 
note several reasons why we need not be too precise in making such ap- 
proximations. First, as noted earlier, the variability results depend on 
the noise being exactly Gaussian. Real noises (and especially real signals) 
need not be exactly Gaussian. Thus, even exact results in Gaussian 
theory would be approximations in practice. Second, the chief purposes 
of studying variability are first to choose, once for all, effective methods 
of analysis, and then, in each situation, to determine about how much 
data will be required for the desired or given accuracy. Again, approxi- 
mate results will be adequate. Third, it would not be safe to use the ad- 
vance estimates of variability as firm, guaranteed, measures of the sta- 
bility of the actual computed results in a practical situation, since other 
sources of variability may well contribute to the deviation of a particu- 
lar spectral density estimate from its long run value. (Non-constancy 
of total power level, even with distribution-over-frequency remaining 
constant, and failures of stationarity are two simple examples.) We 
must rely on observed changes from trial to trial as basically the safest 
measure of the lack of stability of our spectral density estimates. 

Thus, the purposes of variability theory are well served if its results 
are approximate — deviations of actual variability from theoretical 
variability of ±5 per cent, ±10 per cent or even ±20 per cent will be 
quite satisfactory. Judged by this standard, the variability theory based 
on (i) the Gaussian assumption and (ii) treating the distribution of the 
spectral density estimates as if they followed so-called "chi-square" 
distributions, as we shall do in the next section, will usually be very 
satisfactory. 

9. CHI-SQUARE — EQUIVALENT DEGREES OF FREEDOM 

If ?/i i !h f • " • , Uk are independently distributed according to a stand- 
ard normal distribution, that is, according to a Gaussian distribution 
with average zero and unit variance (and, consequently, unit standard 
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deviation), then 



Xi: 



2 i 2 ■ 

= 2/1 + ?/2 + 



+ y k 2 



which is obviously positive, follows, by definition, a chi-square distri- 
bution with k degrees of freedom. The coefficient of variation of x* , 
the ratio of RMS deviation to average value, is (2/k) m , so that, as k 
increases, Xk becomes relatively less variable. This statement also ap- 
plies to any multiple of Xk • 

A convenient description of the stability of any positive or nearly- 
positive estimate is its equivalent number of degrees of freedom, the num- 
ber of degrees of freedom of that xu some multiple of which it resembles 
(in average and variance unless otherwise specified). We can find such a 
k from 



k = 



2 (average) 2 

variance (coefficient of variation) 2 ' 



Interpretation is aided by Tables I and II. These tables are possible 
because the distribution of the ratio of any multiple of x* to the aver- 
age value (of that multiple) depends only on k. Thus, if k = 4, individual 

Table I 
Distribution of quantities which are distributed as fixed multiple of chi- 
square. Ratios of individual value to its average value exceeded with 
given probabilities. 



Degrees of freedom 


Exceeded by 90% of 
all values 


Exceeded by 50 % of 
all values 


Exceeded by 10% of 
all values 


1 


0.016 


0.46 


2.71 


2 


0.10 


0.70 


2.30 


3 


0.19 


0.79 


2.08 


4 


0.26 


0.84 


1.94 


5 


0.32 


0.87 


1.85 


10 


0.49 


0.93 


1.60 


20 


0.62 


0.96 


1.42 


30 


0.69 


0.98 


1.34 


40 


0.73 


0.98 


1.30 


50 


0.75 


0.99 


1.26 


100 


0.82 


0.99 


1.18 


200 


0.873 


1.00 


1.139 


500 


0.920 


1.00 


1.081 


1000 


0.943 


1.00 


1.057 



Examples: (1) If the long run average is 10 volts 2 /cps, then among estimates 
with 10 degrees of freedom, 10 per cent would fall below 4.9 
volts 2 /cps, and 50 per cent would fall above 9.3 volts 2 /cps. 
(2) If a single observed estimate, with 5 degrees of freedom, is 
observed to be 2 volts 2 /cps, then we have 80 per cent confidence 
that the true long-run value lies between 2/1.85 = 1.08 
volts 2 /cps and 2/0.32 = 6.25 volts 2 /cps. 



MKA.SU KF.MFAT OF POWER SPECTRA 



209 



Table II — Behavior of xi? ox Decibel Scale 







k required for intcrvalt of spread 


Fraction of 


Spread* of intcrvalt in dbt 




distribution 












10 db 


5 db 


2db 


l db 






1 


3 


11 




40% 


Q/Vk - 1 


42 


60% 


10/V* - 1 


2 


5 


28 


105 


so% 


167VA- - 1 


4 


11 


63 


250 


90% 


20/Vk - 1 


5 


18 


104 


410 


96% 


25/ y/k - 1 


8 


27 


161 


640 


98% 


29/Vk - 1 


10 


34 


207 


820 



* Accurate to nearest integer in numerator for k Si 4, except for 80 per cent, 
where 16 should he replaced by 15 for k g 11. Based on Tukey and Winsor. 13 
(Spread is the difference between the upper boundary expressed in db, and the 
lower boundary expressed in db.) 

t All intervals are symmetric in the probability sense, half of the non-included 
probability falling above and half below the interval. 

J Since we are dealing with measures of variance, analogous to power, lOdb = 
(factor of 10), and (number of db) = (10 logio ratio of variances). 



values of any particular multiple of Xi will, in the long run, fall below 
0.26 times their average value in 10 per cent of all cases (will be 5.8 db 
or more below average in 10 per cent of all cases). Similarly, individual 
values will, in the long run, fall below 0.84 times their average value (be 
0.7 db or more below average) in 50 per cent of all cases, and in 90 per 
cent of all cases will fall below 1.94 times their average value (be 2.9 db 
or less above average). Thus, in the long run, 80 per cent of all values 
will fall in an interval of spread (2.9) - (-5.8) = 8.7 db. 

Thus, for example, to obtain 4 chances in 5 that a single observed 
value will lie within ±30 per cent of the true value we require (see Table 
I) about 40 degrees of freedom, while to obtain 4 chances in 5 that a 
single observed value will lie in a prescribable interval of length 5 db, 
we require (see Table II) at least 11 degrees of freedom. 

The results of the preceding section indicate that, for an estimate of 
smoothed spectral density, when P(f) is smooth, the number of degrees 
of freedom is given by 



k = 2T' n W e = 



We 

A/' 



where the latter form expresses the number of degrees of freedom as the 
number of elementary frequency bands, each of width 

4/ - w. • 

contained in the equivalent width W e . 
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For design purposes, the relation of the last section (including the 
small safety factor) indicates that 

2 _2^> 

[var {P.-(/i)|]/[ave (P<(/i)H 2 " T m 

when P(f) varies slowly. (This will usually be the case if (i) k > 3 or 4, 
say, and (ii) P a (J) is a moderately smooth single hump. For, under these 
circumstances, P.iCO will not change rapidly in a frequency interval 
l/T' n and the same property can then be inferred for P(f) itself.) 

When, on the other hand, P(f) consists of a single sharp peak, we 
find, using the last result of Section 7, that k « 2, so long as 1/T'„ 
is not small enough to be comparable with the width of the peak. At 
first glance, this result may appear a little surprising, but when we 
notice that a single spectral line corresponds either (a) to frequency 
+/ and to frequency -f , or (b) to cos o)4 and to sin ad , or (c) to ampli- 
tude and to phase, it appears quite natural that a sharp line carries two 
degrees of freedom and not merely one. 

We may summarize the semi-quantitative study of the stability of 
estimates of the smoothed power spectrum as follows: 

(a) It is not necessary to judge stability with very high accuracy. 

(b) It is convenient to measure stability by analogy with the number 
of degrees of freedom associated with a multiple of a chi-square variate. 

(c) The equivalent number of degrees of freedom can be regarded as 
the number of elementary bands of width A/ in the equivalent width W c 
of the filtered spectrum 

2PM) = H i (f;f 1 )-2P(f) (/^0) 

if the result is not too small (say > 3 or 4) and P a (f) is moderately 
smooth. 

(d) If the filtered spectrum approaches a single sharp peak, the 
equivalent number of degrees of freedom for the corresponding estimate 
approaches two. 

In interpreting the concept of equivalent number of degrees of freedom, 
it may be helpful to imagine the continuous density of the filtered spec- 
trum replaced by a discrete set of ordinates, one per elementary fre- 
quency band. If these ordinates are p , Pi , Vi , •", tne natural approxi- 
mation to the number of degrees of freedom is 

, _ (po + Pl + P2 + • • • ) 2 

Po 2 + Pl 2 4- p 2 2 + 
as illustrated in Fig. 4. This approximation will usually be satisfactory 
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Fig. 4 — Equivalent degrees of freedom. 

as long as the effect on k of moving each ordinate around within its 
elementary frequency band can be neglected. (In more extreme cases, 
an approximation based on two ordinates per pair of elementary fre- 
quency bands is more precise.) 



10. DIRECT ANALOG COMPUTATION — GRADED DATA WINDOWS 

We have been dealing thus far with continuous time, and the com- 
munications engineer will naturally ask, "Why introduce autocovariance 
functions and all that, why not measure the spectrum by filtering, recti- 
fying, and smoothing?". The only fair answer is "By all means, do so if 
you can obtain, and maintain, the necessary accuracy economically!" 
Let us apply our results to such a measurement technique. 

Let X(t) be the noise or signal whose power spectrum P(J) we wish 
to study. Let us pass it through a filter of transfer function Y(f), and 
designate the result by X onl (t). Its power spectrum, Pout(/), will be given 
by 

PoUf) = I V(f) \ 2 -P(f) 

and if a section of A r o „t(0 of length T n is applied to an ideal quadratic 
rectifier and smoothed by a smoothing circut of infinite time constant, 
the result will be 



f 

Jo 



[X oui (t)] 2 dt. 
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The average value of this result divided by T n is 



[ 2P ut(/)d/, 
Jo 



and the number of equivalent degrees of freedom is the number of 
elementary frequency bands, of bandwidth 1/(2T„), contained by the 
equivalent width of | Y(J) | 2 -P(/). This last function is of the form 

(power transmission function) (original power spectrum) 

just as before. We see that the ideal process of filtering, rectifying, and 
smoothing the actual input has produced the same accuracy as the ideal 
process of calculating, modifying, and transforming the apparent auto- 
covariance, provided that | Y(f) | 2 = #,(/; /i) for a suitable choice of 
T m , f, and /i . This is what we ought to have expected, since we believe 
that either method extracts nearly all the information about the spectrum 
which the data provides. 

A few practical considerations deserve mention. They center around 
the actual switching sitations which can arise, especially when we have 
only a finite sample of the original noise. In Fig. 5, the watt-second meter 
includes quadratic rectification and integration functions which we think 
of as ideal. (It may be very important to allow for the fact that the 
"ground" position of switch a is not quite at the same potential as the 
zero of the input noise, but we shall neglect this effect for the moment.) 

Some four sorts of operation can arise according to the times at which 
switch b is operated. The watt-second meter may be connected either at 
the beginning of the running period T or after some interval of time 
(to allow initial transients to become negligible), and may be discon- 
nected either at the end of the running period T or after some interval of 
time (to allow the meter to reach a maximum). These four modes of 
operation are illustrated in Fig. 6. 

In Mode I, providing the initial waiting period is long enough to allow 
transients to become negligible, the filter output is essentially stationary, 
and the earlier discussion in this section applies. 



STATIONARY 
RANDOM 
PROCESS 



WATT -SEC 
METER 



DUMMY 

LOAD 



Fig. 5 — Schematic analog circuit. 



MEASUREMENT OF POWER SPECTRA 



213 



PRE-RUN 



ALL MODES 



SWITCH A DOWN 



FILTER CLEAN 
METER ZEROED 



SWITCH A UP 



SWITCH A DOWN 
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B DOWN 
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B up- 



-•-TO MAX. READING 
DIVIDE BY T 



MODE m 



•B UP 



*— B DOWN 
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DIVIDE BY T' 



Fig. 6 — Time histories of operation for different modes. 



In Mode II, all of the energy output is recorded on the meter, but the 
reading is divided only by the length of the input data. This mode is 
amenable to exact and complete analysis which is given in some detail 
in Section B.10. The results differ from those of Mode I in that the 
transform of the boxcar function of length T (running period) is con- 
volved twice into the spectral window. (Convolution is defined and dis- 
cussed in Appendix A.3.) If T is not large, the effects may be somewhat 
uncomfortable in that the spectral window becomes wider and more 
ragged. 

Mode III, discussed briefly in Section B.10, differs from Mode II 
by an additional convolution whose effect again disappears as T — > *> . 

Mode IV resembles Mode I in that the noise input is passed through 
the filter until transient effects have become negligible, when the meter 
is switched on at the filter output. It differs from Mode I in that the 
meter is read after a final waiting period. This seems to offer no advan- 
tages over Mode I, and will not be discussed further. 

The contrast between Mode I and Mode II is another example of what 
should now be becoming familiar. Mode I has no additional convolution 
in the spectral window. Mode II provides data economy by making it 
possible to integrate over the whole length of the available record. We 
should really like both advantages. 

We can, indeed, obtain most of both advantages, but only by replac- 
ing the sharp edges of the switched data window by the smoothed outlines 
of a graded data window. In other words, we need to introduce 
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X in {t) = B(t)-X{t) 

at the input of the filter, where B(t) vanishes except for < t < T, and 
is smooth enough to have its Fourier transform «/(/) concentrated near 
/ = 0. Details are discussed in Section B.10. 

Difficulties arising from the fact that the zero of the X(t) input might 
not be at ground are shown in Section B.10 to behave similarly to 
those arising from switching transients, namely, no effect in Mode I, 
possibly uncomfortable in Modes II and III, usually negligible when a 
well-chosen graded data window is used. 

Another device is sometimes used to make maximum use of a finite 
noise record. The record is merely closed into a continuous loop, and the 
rectifier-smoother output averaged. It is shown in Section B.10 that 
here, too, we must use a graded data window B(t). 

11. DISTORTION, NOISE, HETERODYNE FILTERING AND PREWHITENING 

Another group of very important practical considerations center 
around the spectrum of the "signal" as it is handled (either instan- 
taneously, or in recorded form). We have spoken of "filtering, rectify- 
ing and smoothing" and have treated all these steps as ideal. No atten- 
tion has been given to the equally vital "gathering" and "transmission 
and recording" steps. Tacitly, they too have been treated as ideal. 
Realistically, we must expect a certain amount of distortion (non- 
linearity, intermodulation, etc.) and the addition of a certain amount of 
background noise in all three of the first steps: gathering, transmission 
and recording, filtration. It often proves to be most important to lessen 
the ill effects of such distortion and noise addition. 

In a perfect system, and with a fixed spectral window, the fluctuations 
of an estimate are proportional to its average value. If we have a fixed 
uniform noise level, it will do the least additional damage if all the 
average values of the estimates are of about the same size, for then no 
low estimate can "disappear" into the noise. 

Intermodulation distortion will have the greatest effect on the signal 
being transmitted when two strong frequencies combine to produce a 
modulation product whose frequency falls in a very weak region of the 
spectrum, for it is in such situations that the fractional distortion of the 
spectrum reaches its maximum. To minimize possible effects of intermod- 
ulation distortion it is again desirable to transmit, record and generally 
handle signals with a roughly flat spectrum. 

To these noise and intermodulation considerations another sort of 
consideration may be added. Many frequency analyzers use a hetero- 
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dyne system, bringing the frequency band to be studied to a fixed filter, 
rather than tuning a filter across a wide frequency band. The power trans- 
fer function of the combination of heterodyne modulator and fixed 
filter, referred to input frequency, will depend only on A/, the deviation 
of | / | from | /o |, where /o is the nominal frequency of the fixed filter, and 
will be denoted by Q,-(A/). If demands at different frequencies differ, 
the shape of Q.-(Af) may have to be a compromise. One sort of demand 
arises when P(f) varies very rapidly. The net contribution near frequency 
/i to the average value of the spectral density estimate is measured by 
Hi(f; /i) -P(f), where, as elsewhere, H t {J\ A) = <?.(/ + /i) + Qi(f - /i). 
If our estimate is to be useful, only /'s near /i should have a substantial 
net contribution. If P(f) rises steeply as / leaves /i , we may have to 
require a very rapid fall-off in //,(/; /i), here practically equal to <?,-(/ — /i), 
in order to attain this as / leaves f t . We may thus be forced to compro- 
mise properties of Qi(A/) useful near other frequencies. The simplest 
way to avoid such problems is to arrange for the P(f) of the "signal" 
analyzed to be fairly constant, or at most slowly varying. 

Thus, for a variety of reasons, we can often gain by introducing "com- 
pensation" or "preemphasis" to make more nearly constant the spec- 
trum of the "signal" actually transmitted or recorded, and analyzed. 
Since the ideal would be to bring the spectrum close to that of white 
noise, it is natural to refer to this process as prewhitening. Such flattening 
of the spectrum need not be precise, or even closely approximate. We 
need only to make the rate of change of P{f) with frequency relatively 
small. 

Because of advantages related to the noise and intermodulation dis- 
tortion introduced in various steps of the sequence, it will be best, other 
considerations aside, to carry out such prewhitening at as early a point 
in the measurement-analysis sequence as possible. Sometimes this can 
even be done in the pick-up or sensing element. 

This whole philosophy of prewhitening, which appears quite natural 
to the communication engineer familiar with preemphasis and other 
techniques for increased information transfer within a given frequency 
interval, comes as a great change to the instrumentation engineer, whose 
clients ordinarily require "faithful" reproduction of an input at the out- 
put, by which they mean phase shifts nearly linear with frequency, and 
a nearly constant amplitude response up to some high frequency. It will 
be rare indeed, in practical spectrum analysis, that the ideal response 
for the initial transducer and amplifier will be flat. Instead it should 
have a characteristic contributing to prewhitening. This characteristic 
will, of course, have to be measured separately and the corresponding 
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adjustments to the estimates of the spectral density will have to be made 
so that these estimates, instead of applying to the "signal" actually 
analyzed, apply to the original "input signal", but such labor will often 
be many times repaid. 

One further consideration about frequency responses in measurement 
now enters naturally. In almost every power spectrum problem there is 
an upper frequency beyond which there is no appreciable interest. In 
most components used in measurement, transmission, recording, etc., 
the noise level, and often the level of intermodulation distortion, is 
roughly a fixed fraction of the peak useful level. If substantial power is 
present at frequencies so high as to be uninteresting, then the need to 
keep total power below the peak useful level forces us to handle the 
interesting frequencies at a power level below that which could other- 
wise be used. The ratio of noise and intermodulation distortion to inter- 
esting signal is thus raised — the quality of the analysis and its results 
degraded. The appropriate remedy is to filter out the uninteresting high 
frequencies at as early a stage as possible. This is a further reason why a 
carefully tailored frequency response is an important part of a power 
spectrum measuring process. 

Together with the need for adequately wide filters (we can of course 
use narrower filters when we are prepared to average over homogeneous 
records of sufficiently long total duration) to provide enough equivalent 
degrees of freedom, and hence enough stability for the estimates, this 
tailoring of frequency response is often the crucial part of a power spec- 
trum measuring program. Indeed, there may sometimes be no reason- 
able way to measure power spectra with an ill-tailored frequency re- 
sponse, even if this response be "flat". 

Equally Spaced Records 

We come now to treat a modified situation of great practical impor- 
tance, where the observations are used for analysis only at equally spaced 
intervals of time — not as a continuous time record. Two new and im- 
portant features enter: there is aliasing of frequencies, and practical 
analysis will involve digital rather than analog computation. In general, 
however, the situation is surprisingly similar to the case of a continuous 
record, with limitations on data-gathering effort still forcing us to com- 
promise resolution and stability. Advantages of convenient calculation 
and noise reduction still lead us to prewhitening. Filtering of equi-spaced 
data must involve transversal filters (see Glossary of Terms for defini- 
tion) whose transmission properties (in frequency) exhibit a periodic 
symmetry. This exerts additional pressure toward prewhitening. 
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Questions regarding computational techniques arise anew because 
of the nature of digital computation. These include means for reducing 
the effects of a displaced (perhaps drifting) zero, smoothing by groups 
to economize arithmetical operations on the whole, and preliminary 
rough estimation as an aid to planning. 

12. ALIASING 

We now suppose that X(t) is available, or is to be used, only for uni- 
formly spaced values of t, which we may as well suppose to be 

I = 0, At, 2At, 3At, ■■■, nAt, 

so that C(t) can only be estimated for 

| t | = 0, At, 2 At, •■-, nAt. 

Now, the equations 

C(r) = [ 2P A (/)-cos2x/r-d/, 
Jo 

\t\ = qAt, q = 0, 1, •••, w, 

if soluble at all, can always be satisfied by a P.i(/) which vanishes for 
/ > j N = 1/(2 At), although the power spectrum P(f) of the original 
process (for which the C(t) was defined) might actually cover a much 
wider frequency range. (We shall reserve the notation Pa(J) for such a 
function, vanishing for | / 1 > f N .) While frequencies between / = 
and f = Jn are clearly distinct from one another, we face a problem of 
aliasing, since frequencies above f N usually contribute some power. Each 
frequency, no matter how high, is indistinguishable from one in the 
band from to j N ■ 

The essential, unavoidable nature of this problem is made clear by 
Fig. 7 which illustrates how equally spaced time samples from any 



f = 4 



f = 1 




L 



|<--At=o.2--J 

ONE SECOND- 

Fig. 7 — Sampling of sinusoidal waves. 
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cosine wave could have come from each of many other cosine waves. 
(The familiar stroboscope uses a particular expression of this fact in 
apparently "slowing down" rapidly rotating or oscillating machinery.) 

The logical position about P A (f) depends very much on whether X(t) 
is thought of as having any real existence for 1 1 \ z* qAt. 

If X(t) really exists for continuous t, although we have (i) failed to 
observe or record it, or (ii) failed to "read" the record, or (Hi) decided 
to neglect the available values, then there is a well-defined P(f) cor- 
responding to the process from which each X(t) is a sample, and we must 
be very careful about the relation between P(f), which is our true con- 
cern, and P A (f), which is clearly all we can strive to estimate directly 
from the data. It can be shown (see Section B.12) that, in the form 
appropriate for a one-sided spectrum, if we set 

2P a (J) = 2P(/) + 2P(2/„ - /) + 2P(2/„ + /) 

+ 2P(4/„-/)4-2P(4f„+/) + ••• 



then we may take 



, otherwise 



where f N = 1/(2AQ is the folding (or Nyquist) frequency. We naturally 
call the frequencies /, 2f N - f, 2f N + /, 4/V - /, 4/V + /, and so on, 
aliases of one another, / being the principal alias. The aliased spec- 
trum P a (f) is the result of aliasing P(/). The principal part of the 
aliased spectrum P A (f) is the part of P a (f) which corresponds to 
principal aliases, positive and negative. 

(If X(t) has no natural existence for t'& which are not integral multiples 
of At, then P(f) is not uniquely denned, and we are at liberty to choose 
any normalization we desire. In particular, we may decide to limit P(/) 
to the interval | / | ^ 1/(2 At), in which case we will be enforcing P(f) = 
P A (f) without any trace of aliasing. We mention this case for logical 
completeness, but remark that it seems to occur infrequently in practice, 
whatever the field.) 

If the Gaussian noise we are considering has a power spectrum P(f) 
which extends outside |/| ^ 1/(2 At), then the Gaussian noise with 
spectrum P A (f) is not the same for continuous time. However, if we con- 
sider these two noises only for equi-spaced times 

t = 0, At, 2At, 

they are identical. For all first moments vanish and all second moments 
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coincide, which implies coincidence of the joint distributions of any 
finite set from • • • , X_ g , • • • , X_i , X , Xi , • ■ ■ , X q , • • • , and this is our 
definition of the coincidence of two noises. (If a result concerning such 
equally spaced values can be established for a Gaussian noise restricted 
to have P(f) vanish outside |/ | ^ 1/(2 At), it must trivially hold, under 
the same restriction, when all occurrences of P(f) are changed to Pa(J)- 
It is a consequence of the identification just established that the result, 
when expressed in terms of Pa(J), must also hold for any Gaussian noise 
whatever.) 

The frequency interval from to f N contains a certain number of 
elementary frequency bands in the sense of our treatment of variability. 
The total length of record is T n = nAt, and if we write T' n = nf At for 
the effective length, then, since 

1 

fs _ 2At 



elementary frequency bandwidth 1 

2~r; 



= n 



there are n' elementary frequency bands between and f N . As a statisti- 
cian would have anticipated, we gain one elementary frequency band — 
one degree of freedom — for each added observation. 

It is perhaps natural to base a hope on this result — a hope that 
taking data more frequently over the same time interval would gain us 
many degrees of freedom and reduce our difficulties with variability. 
However, this is not the case (as the expression for the width of an ele- 
mentary frequency band 1/(2T„) should have warned us). Taking ob- 
servations twice as frequently yields twice as many elementary fre- 
quency bands, but also doubles the folding frequency f N and, thus, 
doubles the frequency interval occupied by principal aliases. The density 
of elementary frequency bands is not increased one iota. (Clearly, iota 
was the Greek word for bit!). 

It is usual for aliasing to be present and to be of actual or potential 
importance. This is an inescapable consequence of data taken or read 
at uniform intervals. (It is not infrequently suggested that there should 
be a workable scheme of taking discrete data in some definite, but not 
uniformly spaced pattern, and estimating the power spectrum without 
aliasing. No such scheme seems so far to have been developed). 

13. TRANSFORMATION AND WINDOWS. 

Given uniformly spaced values of X(t) — values which we shall now 
designate X , X x , ■■-, X n , as well as X(0), X(At), ■■■, X(nAt) — we 
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expect to calculate "sample au toco variances", modify them, and then 
Fourier transform the results. There is no possibility of calculating auto- 
covariances for lags other than 0, At, • • • , nAt, and so we may as well 
write Co, d, ■■■, C m in place of C,(0), d(At), ■■-, d(mAt). If we 
Fourier transform these m + 1 numbers, as obtained or modified, we 
might obtain a smoothed spectral estimate for any frequency between 
and /at = 1/(2 At) that we may wash. It is not surprising, however, that 
we lose no information (and little explication) if we calculate only m -f- 1 
such estimates (one for each C r ). Nor is it surprising that we regularly 
take these estimates equally spaced over ^ / ^ f N , and hence at 
intervals of fy/m = l/(2mAt). As a consequence we have to deal with 
finite Fourier (cosine) series transformation (classical harmonic analysis) 
rather than with infinite Fourier integral transformation, but the cor- 
respondence between multiplication and convolution persists. 

The question of modification also requires discussion. In the continu- 
ous case we Fourier transformed 

C t (r) = Di(r)-C m (r) = DM-C 9 {t) 

where C (t) coincided with Coo(r) wherever the latter was defined, and 
is zero otherwise (cp. Section B.5). The result was, consequently (e.g. 
see Appendix A. 3), the convolution of the Fourier transforms of Z>,(t) 
and C (t). So long as time was continuous and computation was pre- 
sumably by analog devices, there was a real advantage to modification 
before transformation. Now that time is discrete and computation pre- 
sumably digital, the advantage is transferred to first transforming and 
then convolving. Indeed, because the Z>,(r), for i > 1, are finite sums of 
cosines, so that their transforms are simply sums of spikes (Dirac delta- 
functions) at the appropriate spacing, convolution means only smooth- 
ing with weights 

0.25, 0.5, 0.25 (i =2, hanning) 

0.23, 0.54, 0.23 (i = 3, hamming) 

and is very simply carried out. 

In discussing this program, we gain some generality by using m + 1 
lags separated by At = hAt for an integer h > 0, while our results are 
no more complicated than if we were to confine ourselves to h = 1, 
which is the practical case. Thus, we first compute the mean lagged 
products 

1 q =n— rh 

C r = 7 2—1 ■**■ Q ' **■ «+>"A 

n — rh g=o 
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for r = 0, 1, 2, • • •, m, where mh < n. Note that C r is heuristically as 
close as we can come to the apparent autocovariance Coo(±?"At) with the 
available (equi-spaced) data. Note further that, so far as functions of 
the C r are concerned, our effective folding frequency is 

* 1 1 

h =2Ar = 7l fN - 

We will usually need to adjust the C r somewhat to improve very-low- 
frequency performance, as discussed in Section 19, but this need not 
concern us for the moment. 

Applying a discrete finite cosine series transform to the sequence Co , 
C*i , • • • , C m , we find 



V r = Ai 



r m— 1 "I 

Co + 2 £ C 9 -cos ^ + C,„-cos r-K . 



(We may regard this as arising from replacing Co(r) in the expression for 
P (/) as its Fourier integral transform by a finite sequence of spikes 
(Dirac delta functions) of intensities (areas) proportional to the corre- 
sponding values of Cn(r).) If we put 



u \2wAr/ 



then it is shown in Section B.13 that 

ave{P ,(/)| = P Qo(f-f';AT)-P(f')-df 



where 



Qo(/; At) = At -cot -—-sin wiojAt. 

a 



In terms of Qo(f), which is treated in Section B.5, we have 

(M; At) = E m Qo (f-£) = Q»a(/). 

Just as the average value of P (f) in the continuous case is the cor- 
responding value of Q»(f) *P(f), so here the average value of Poa(/) is the 
corresponding value of Qo(/; Ai) *P(f). Thus, we may consider Poa(J) as 
estimating the result of "smoothing" P(f) with a window Q (/; At) which 
has repeated major (and concomitant minor) lobes at intervals of 
2/jv* = (At) -1 . This is not the most convenient way to consider matters, 
and in Section B.13 it is shown that there are two equivalent forms for 
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ave {Poa(J)\ and, correspondingly, two other, equally appropriate, ways 
to consider the situation. 
These arise from the three-fold identity 

QUf)*P{f) - (Mf)*P«(f) = QoM)*PA(f), 

any member of which represents the average value of P<u(0- Thus, we 
can also consider Po A (f)- (i) as estimating the result of smoothing the 
infinite, periodic aliased spectrum P a (f) with the same window as for 
the continuous case, or (ii) as estimating the result of smoothing the 
principal part of the aliased spectrum Pa(J) with the aliased window 
Qoa(/)- The latter choice is usually the most helpful of the three possi- 
bilities, and is the one we shall adopt. 

All this has been discussed for the immediate results of transforming 
unmodified C r 's. This is only the case i = of the identity 

Q<AO*P(f) = QiW *P*(f) - QiM) *Pa(D 

which holds in general. We should thus usually be concerned with Qa(/) 
and with P A {j)- 

The case i = 2 (hanning) corresponds to the following smoothing after 
transformation : 

U = 0.5 V + 0.5 Vi , 

U t = 0.25 V r -i + 0.5 V r + 0.25 V r+1 , lgrgw-1, 

U m = 0.5 V m -, + 0.5 V m , 

for which Qia(J) has the form shown in Fig. 8. The curve is for m = 12, 
and the circles are for m = «, which corresponds exactly to the con- 
tinuous case. Clearly, for usual values of m, the modification in the lobes 
due to aliasing is almost surely unimportant. 

The frequency separation between adjacent estimates is 

1 1 



2T m 2m At ' 
but the equivalent width of the windows (for 1 f£ r ^ m — 1) is about 

1.30 _ 1.30 

T m mAr ' 

just as for the continuous case (see Section 8). For most purposes we 
may again take the bandwidth corresponding to each estimate as 1/2'™ , 
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Fig. 8 — Aliased spectral window Q 2A for m = 12. 

so that m satisfies 

— = (bandwidth of estimates) -(Ar). 
m 

If we had neither modified before Fourier transformation, nor 
smoothed after transformation, we should have faced the uncomfortable 
minor lobes of QoAf) shown in Fig. 9 for m = 12 (with circles for m = <*>). 
Generally speaking, all we learned about desirable lag windows for the 
continuous case carries over with minor modifications, at most. The 
only serious effect of going to uniformly spaced values is the aliasing 
(and this may be very serious indeed). 

It is well worth noting that the possible spectral windows Q t A (f) are 
now restricted to be finite Fourier series hi cos wAt, cos 2o>At, • • • , 
cos wicoAt, or equivalently, to be polynomials in cos coAt of degree m at 
most. 

14. VARIABILITY AND COVARIABILITY 

We now extend all our other notation: #,(/; /i), P,-(/i), etc. to cor- 
responding H iA (f; /i), P.a(/i), etc. for the uniformly spaced case as 
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Fig. 9 — Aliased spectral window Qoa for m = 12. 

specified in Sections B.13 and B.14. It is shown in the latter section that 
we now have 



cov 



Jo 



where 

r.(/)^4/_>. ( /' + /).P.(r-/).(^) 2 .(^)- ! ,r, 

(a/ = 27r/'), with a very slightly different determination of T„ than be- 
fore. The only essential change has been the introduction of a new 
factor, corresponding to aliasing, 

'sin u'aA -2 
a' At ) 

into the integrand of the power- variance spectrum Tm(J). For usable 
values of n, this factor will vary much more slowly than 

sin Win 
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and can usually be treated as sensibly equal to unity. All the approxi- 
mate analysis of covariability and variability given for the continuous 
case now goes through without essential change. 

15. PREWHITENING 

If the equally spaced data is sampled from a continuously transmitted 
"signal" or "read" from a continuous recording, then all the points 
made in Section 11 in favor of early prewhitening are still applicable. 
If the equally spaced data arises more directly, as by photographing a 
physical situation, we may not be able to apply prewhitening early. In 
either case it may still be desirable to prewhiten after the data is obtained 
at equal intervals, either as a supplement to, or as a partial replacement 
for, early prewhitening. 

The average value of a power density estimate P,^(/i) is 

ave (Pa(/i)l = f Pux(f)-df, 
Jo 

where 

PUD = //,,(/;/.)-^.,(/). 

We want this quantity to tell us about the values of P(/) for/ near, A . 
To do this we must: (i) reduce variability, (ii) ensure that Pa(J) re- 
sembles P(f) sufficiently, and (iii) concentrate Pi.n(f) near/ = /i . Wc 
must be concerned with: (i) adequately broad windows, (ii) sufficiently 
weak aliasing, and (iii) enough sharpness in the effective filter. This 
sharpness can be obtained in a combination of ways. 

Note that wc asked for P,-.u(/), which measures the net contribution 
to the average value, to be localized. We did not merely ask that //,m(/;/i) 
should be localized. For, if 



although 

it is still possible for 

to outweigh 



P ,(/>) > > > P.4(/l), 

P,.u(./V) = //,,(/ 2 ;/i)-P.,(/2) 

PiAlifl) = H iA (fi;fl)-PA(fl), 



so that our estimate tells us more about P(/) near/ = / 2 than it does 
about P(0 near/ = /i . To avoid such unfortunate situations either we 
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must choose our window pair in a very particular manner (so as to make 
H iA (f 2 ; .fi) exceptionally small) or we must avoid PaU*) >>> Pa(J\)> 
Both courses are possible and sometimes necessary. Usually, the second 
course is simpler. 

Following the second course is simple in principle. Given actual values 
X q , we apply a selected linear procedure to obtain new values X q and 
analyze these. The aliased spectrum P A (f) of the X q differs from the 
aliased spectrum P,i(/) of the X q by a known multiplicative function of 
frequency. (See Section B.15 for details.) Thus, (i) we may convert 
estimates of P A (f) into estimates of Pa(D, and (ii) we may choose the 
linear procedure to make the aliased spectrum P,i(/) of the A, reasonably 
flat. 

The simplest linear procedures are probably the formation of moving 
linear combinations and the construction of autoregressivc series. A 
simple example of a moving linear combination is 



X, = X q — aXn-x — &Xg-i — yX q - 



for which the relation between the spectra is 

PaU) PU) ' 

= a cubic in cos u>At. 

A suitable moving linear combination will generate any desired non- 
negative polynomial in cos coAf. 

A simple example of an autoregressive combination is 

X g = X q + XAVi + nX q -2 + vX tl _, 

for which the relation (reciprocal to that just considered) between the 
spectra is 

tAH = W = (1 1 - xe"^ 1 - ne-""" ~ «*-*"" IT 1 

PaU) P(f) 

= (a cubic in cos <aM)~ . 

A suitable autoregressive combination will, when indefinitely continued, 
generate the reciprocal of any desired non-negative polynomial in 

cos u)M. 

By combining a suitable moving linear combination with suitable auto- 
regression, as for instance in 

X q = X q — aXq-i -\- AX<7_i , 



1 - ae'^' 


•j 


1 - Xe-' uA( 




L + a — 2a COS coA/ 
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which may also be written 

Xg — \Xg-l = X,, — CiXg-1 , 

for which 
PAf) = P(f) 

PaU) P(f) 

~ 1 + X 2 - 2\ cos o>A/ 

= a rational function of cos coA/, 

we can modify P A (f) by multiplication by an arbitrary non-negative 
rational function of cos wM. 

Freedom to multiply by any (simple) non-negative rational function 
of cos coM is very substantial freedom. If we have a rough idea (see Sec- 
tion 18) of the behavior of P A (f), and if this behaviour is moderately 
smooth, though perhaps quite steep in places, we can usually do a very 
good job of flattening the spectrum by pre whitening after obtaining dis- 
crete (digital) values. Unless still bothered with steep slopes, we will 
usually then find that hanning, with its (0.25, 0.50, 0.25) weights and 
lower outer lobes is slightly preferable to hamming, with its (0.23, 0.54, 
0.23) weights and reduced first minor lobes. 

The main purpose of prewhitening after data has been obtained in 
digital form at equally spaced intervals is to avoid difficulty with the 
minor lobes of our spectral windows. We may regard the whole process 
of prewhitening, analysis with standard spectral windows, and, finally, 
compensation of estimate, as a means of constructing a set of specially 
shaped spectral windows, one for each center frequency, specially adapted 
to the data we are processing. This point of view is illustrated in Fig. 10. 
The uppermost curve shows the power transfer function of a hypothetical 
prewhitening filter, one which enhances mid-frequencies in comparison 
with those lower and higher. The next line shows two standard spectral 
windows, with symmetrical side lobes. The third line shows the effective 
spectral windows when prewhitening is followed by standard analysis, 
as given by the product of prewhitening power transfer function and 
spectral window. In either case, the side lobe toward mid-frequencies is 
higher than the corresponding side lobe on the opposite side, which is 
lower than for the standard. The lowest curve shows alternative spectra 
for time series which might reasonably be processed by the combination 
of prewhitening and standard analysis shown (since the prewhitened 
spectra would change only slowly). In every case, the side lobes of the 
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special spectral windows are automatically so related to these spectra, 
as to balance and reduce the amount of leakage through them, as given 
by the product, of special spectral window side lobe and original spectral 
density. 




(a) 




(d) 

Fig. 10 — Illustration of prcwhitening; (a) prewhitening power transfer func- 
tion, (b) standard spectral windows, (c) effective spectral windows, and (d) typi- 
cal input spectra to which (a) might be applied. 

Easing of requirements for accuracy (number of significant figures, 
etc.) during computation are ordinarily quite secondary, though pleas- 
ant, advantages of prewhitening during digital calculation. 

16. REJECTION FILTERING AND SEPARATION 

If the difficulties in handling P(/) are due, wholly or in part, to one or 
more quite narrow and very high peaks ("lines" or "narrow bands") 
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then we cannot expect either to afford, or to be able to estimate, the 
great number of accurately chosen constants which would be required to 
obtain a rational function whose reciprocal has a shape very close to the 
given narrow peak. We must adopt a slightly different approach, and 
plan to make at least two analyses of the data — one to estimate the 
behavior at the peak, and another to estimate the behavior away from 
the peak. 

In order to separate the bulk of the information in the data from the 
variation associated with the sharp peak which may be troubling us, we 
may apply to the data a moving linear combination (possibly combined 
with autoregression) whose power transfer function (the factor by which 
the spectrum is altered) has one or more zeroes near the peak. The 
resulting sequence will be largely free of contribution from the peak and 
hence will be suitable for further prewhitening (if required) and analysis. 
(This operation can often, of course, be combined with further prewhiten- 
ing so far as actual calculation goes. It will of course be necessary to 
compensate for the effects of this transformation at frequencies away 
from the peak, when preparing the final spectrum estimates for interpre- 
tation.) 

There remains the estimation of the power in the peak, and possibly 
some inquiry into its width. A number of approaches are possible: 

(1) We may analyze the original data as well as the data with the 
peak rejected, obtaining an estimate at the peak and possibly confirma- 
tory estimates far from the peak. 

(2) We may subtract a suitable multiple of the modified data from the 
original data so as to retain the peak and partially reduce other fre- 
quencies; and then analyze the difference. 

(3) We may apply a band-pass filter to isolate frequencies at and near 
the peak, and then analyze the result. 

Any of these techniques may be applicable in suitable circumstances. 

Other related procedures are sometimes more natural than the use of 
moving linear combinations. Rejection of zero frequency, for example, 
is more naturally, and computationally more easily, accomplished by 
subtraction of the mean of all the data from each X q than by the sub- 
traction of a moving linear combination from each. 

Rejection filtration has been applied in oceanography by Groves, 
Seiwell," Seiwell and Wadsworth, 16 to the elimination of various well- 
defined tides from records. It almost always has to be used to eliminate 
possible peaks at zero frequency (see Section 19 below). 

In electronic measurements we may also anticipate its possible use in 
measurements: (i) close to a substantial harmonic of 60 cycles per 
second (such as 120 cps or 1380 cps), or (ii) near some strong "carrier". 
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17. SMOOTHING BY GROUPS 

The cost of digital power spectrum analysis, once initial investments 
in programming, etc. have been made, and assuming records to have 
already been made and "read", is likely to be associated with the number 
of multiplications involved in computing the mean lagged products (in 
original or modified form). If there are n observations, and m lags are 
used, then there will be roughly nm multiplications. 

Ways of reducing this number substantially are naturally of interest. 
Most of these must depend for their efficacy on our interest in something 
less than the whole spectrum. We have already discussed (in passing) a 
situation which would naturally arise only when we are interested only 
in the lower part of the aliased spectrum. This is the use of lags which 
are multiples of At = hAt with h > 1. The use of lags up to wAt = hmAt 
allows us to explore the spectrum down to frequencies almost of the 
order \/hmAt, which, had we used all multiples of At up to hm, would 
have required hm + 1 values of C r (or of its modifications) instead of 
ra + 1 . The price of doing this is the aliasing of the spectrum with fold- 
ing frequency 1/(2At) = (1A) (1/(2A£)), which is h times as much 
aliasing as if all multiples of At up to hm had been used, yielding a fold- 
ing frequency of 1/(2 At). 

If such intensive aliasing is bearable, this procedure with At > At is 
simple, even though it is not necessarily economical. Indeed, if so 
much aliasing were permissible, we need only have "read" every hth 
data value. In many situations, however, especially where At has been 
taken as large as aliasing will permit, such further aliasing is unbear- 
able. If we are to look at the low frequency part of the aliased spectrum 
P /1 (/)with computational economy, another course will have to be found. 

Our use of linear schemes in prewhitening shows us a possible course. 
Let us begin by applying a linear scheme to the given values X q , which 
attenuates all high frequencies. Then we can face further aliasing, and 
proceed apace. 

If simplicity is controlling, then we take 

X q = X g + X„_i + • • ■ + X^-k+i (A- terms) 

for which the relation between the spectra (the power transfer function 
of the smoothing) is 

P(f) _ " ' 

kccAt 



sin 



■2 



uAt 
sin -75- 
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This will give us zeroes at frequencies which are multiples of \/kAt, and 
we can avoid folding the first two side lobes of this function onto the 
main lobe and still take a folding frequency as small as 2/kM. Such a 
choice will fold the second, fifth, sixth, etc. side lobes onto the first side 
lobe, and it will fold the third, fourth, seventh, eighth, etc. side lobes 
onto the main lobe. We obtain such a folding frequency by retaining 
only one in every k/A of the X q 's. These decimated* X q 's may, in par- 
ticular, be obtained by summing the X q s in non-overlapping blocks of 
fc/4, and then summing these block sums in all possible (overlapping) 
sets of four successive blocks. (This requires (k + S)/k additions per 
original value.) The estimated spectrum below 1/kAt has to be multi- 
plied by 

. coAt 
sm — 



. laoAt 

sin — x— 



and only aliases which are usually negligible will have been superposed 
on the principal aliases. About one fcth of the original principal spectrum 
will be available for analysis. 

The stability obtained by this process can be easily compared with 
that obtained by using all X q and taking At = kAt/A. In each case, the 
width of the elementary frequency bands is approximately l/2T n 
where T' n has slightly different, but not substantially different values. 
The process just described yields nearly the same stability as At = kAt/4, 
and usually involves much less computation, besides avoiding serious 
aliasing. It will almost always be preferred to using At = hAt with h > 1. 

Other schemes of smoothing by groups are discussed in Section B.17. 

18. PILOT ESTIMATION 

The prewhitening procedure demands a rough knowledge of the spec- 
trum for its effective use. Sometimes this rough knowledge can be ob- 
tained from theoretical considerations, or from past experience, but in 
many cases it must be obtained from a preliminary (pilot) analysis of 
the data. Such pilot analyses should be as simple and cheap as possible. 
We now discuss a pilot analysis giving very rough results quite easily. 

Table III exemplifies a form of calculation which is easily carried out 
either entirely by hand, or with a desk calculator. The symbols "8" and 
"<r" refer to differences and sums of consecutive numbers in non-over- 
lapping pairs. Taking the numbers in non-overlapping pairs is not neces- 

* Although this word should refer strictly to the deletion of only every 10th 
item, we shall apply it to the retention of oniy every ,/th item, for wdnitever j may 
be relevant. 
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Continuation of Table III to the Right (Compressed) 
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sary, but saves much calculation at little cost in accuracy. (In this table 
sums and differences are entered in the lower of the two lines to which 
they correspond.) 

The final sums of squares are roughly proportional to the power in 
successive octaves coming down from the folding frequency. They differ 
by only a constant factor, equal to the number 2 of values X q used, 
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Fig. 11 — Pilot-estimated power spectrum. 

from the mean squares of a nested analysis of variance. For many pur- 
poses they can be used as they come. 

For the example of Table III we obtain sums of squares of 205, 441, 
207, 97, 1309, and 1521. These are plotted in Fig. 11 for the successive 
octaves f N to f N /2, f N /2 to / iV /4, / jV /4 to f N /8, / A /8 to f N /W, f N /lQ to 
/jv/32, and the remaining range /.v/32 to 0. We see that the spectrum is 
roughly flat. 

When medium or large stored-program digital computers are available, 
and the data is already available in machine-processable form (so-called 
diamond copy), it will often pay to use less elementary pilot calculations. 
Possible alternatives are discussed in Section B.18. 



19. VERY LOW FREQUENCIES 

The change from continuous "signals" processed in analog equipment 
to equally spaced "data" processed digitally has another important 
practical effect. Analog equipment, unless special care is taken, docs not 
respond all the way down to zero frequency, and this automatically filters 



234 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1958 

out the very lowest frequencies. This fact allowed us, in dealing with 
continuous records, to treat the "signals" being processed as if they had 
zero means. In dealing digitally with equally spaced data, all frequencies 
down to zero are transmitted, unless we take special precautions. Conse- 
quently, we must give serious attention to the very lowest frequencies. 

(We must now distinguish between power (in the sense of a line) at 
zero frequency and power density at zero frequency. The power spectrum 
of a stationary random process with zero means may have finite power 
density at zero frequency without having finite power there. However, 
finite power at zero frequency may be introduced into the data in meas- 
urement. It would then be desirable to filter out the power at (exactly) 
zero frequency without affecting the power density at and near zero 
frequency due to the stationary random process, but this cannot be done 
perfectly.) 

The need for such attention becomes clear when we consider the effect 
of "small" displacements of the average. Suppose that most of the ob- 
servations (say about 999 in 1000) lie between -100 to +100, with a 
few falling outside one limit or the other. This would be the case when 
the standard deviation is about 30, the variance about 900. If the average 
of the observations were 5 or even 10, we might or might not detect at a 
glance its failure to be zero. 

The total power is the square of the average (dc power) plus the vari- 
ance. Numerically, perhaps 25 + 900 = 925 or 100 + 900 = 1000. All 
the dc power belongs to the very lowest frequency band, whose width is 

If we have data at one second intervals for a period of 15 minutes, a total 
of 900 points, we will have a folding frequency of one-half cycle per 
second, and 900 elementary frequency bands before we reach the folding 
frequency. Thus up to one tenth of all the power may be concentrated in 
one 900th of the spectrum, so that the lowest frequency band has a power 
density up to 90 times that of the average of the 899 others. It is not 
surprising that precautions need to be taken to deal with such possi- 
bilities. (After all, our standard spectral windows have side lobes more 
than 1 per cent the height of the main lobe.) 

Slow trends, which may reasonably be regarded as zero-frequency sine 
waves, just as constant displacements are regarded as zero frequency 
cosine waves, are not nearly so likely to involve quite so substantial 
excesses of power density, but instances of this may and do arise. 

Any way of dealing with these effects must essentially remove the 
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lowest elementary frequency band, or both this band and the next to 
lowest one. In the process it will also have to eliminate some parts of the 
next higher elementary bands as well, since we cannot design a filtering 
procedure entirely free of side lobes. Two classes of ways of doing this 
are important. Either the XJs can be linearly altered, as by subtracting 
the mean of them all from each of them, before the mean lagged products 
are calculated — calculated from modified data as if they were original 
data — or additional computations may be made and combined with 
either the mean lagged products or their cosine series transforms. Thus, 
for example, the mean of all data may be calculated and the square of 
this mean subtracted from each and every mean lagged product. The 
effect of all of these modifications can, however, be summarized as apply- 
ing the finite cosine series transform to 

C r — Ekr 

where Ic identifies a specific method of modification, rather than to the 
C r alone. 
In place of 

ave {P u (/)| = Qa(/)*P*(fl, 

we shall now have 

ave [PiAtif)) = Q<amV)*Pa(J) = [Quifl ~ R*(f)]*PM 

where Ra(f) is related to the Ekr in the same way that Qi(f) is related to 
the C r . 

Details for certain special choices for E kr arc given in Section B.19. 
It is there concluded that, among others, satisfactory choices for prac- 
tical calculation appear, for the present, to be, for removing possible 
constants, 

E 0r = (X)' (independent of r) 

and, for removing the effects of both possible constants and possible 
linear trends, 

E ir = (XT- + l(l - I - * - %) (Z* - T-f 
lb \ n- n n 2 ' 

where X + and X~ are the means of the right- and left-hand thirds of the 

X values. 

Warning: It will almost never be wise to fail to use some Ekr in a digital 

computation. 
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Analysis in Practice 

The two sections which follow discuss the questioning and planning 
required whenever a digital analysis of equally spaced data is to be 
made, and exhibit a sample sequence of calculation formulas which 
might result from such planning. They are intended to summarize the 
previous material in its application to analysis. (Application to planning 
for measurement is treated next after this.) 

20. PRACTICAL ANALYSIS OF AN EQUALLY SPACED RECORD 

We may logically and usefully separate the analysis of an equally 
spaced record into four stages — each stage characterized by a question : 

(a) Can the available data provide a meaningful estimated spectrum? 

(b) Can the desires of the engineer for resolution and precision be 
harmonized with what the data can furnish? 

(c) What modifications of the data are desirable or required before 
routine processing? 

(d) How should modification and routine processing be carried out? 
Failure to adequately consider any one question properly, or failure to 
apply any one answer, can make the entire analysis worthless. 

The data presented will have come about by measuring some physical 
phenomenon at regular intervals. Thus, 

1 . the spectrum of the phenomenon 

2. the frequency response of the instruments used to make the meas- 
urements 

3. the probable magnitudes of measuring, and recording or reading 
errors, and 

4. the time separation between adjacent values 
are all relevant. 

The first stage of consideration is to inquire generally about these quan- 
tities, and to determine whether either aliasing (see Section 12) or back- 
ground noise is so heavy as to make the values almost wholly useless. 
Thus, if the spectrum is believed to extend up to 10 megacycles with 
substantial intensity, if the measuring equipment is flat to 1 .2 kilocycles 
and is 60 db down at 5 kilocycles, and if the values are measured every 
2^0 of a second, we may as well stop here and go no further, since the 
whole available spectrum (up to 100 cycles) will be aliased more than a 
dozen times over. (The 1.2 kilocycle measurement bandwidth, which will 
be aliased 12 layers deep, will control rather than the 10 megacycle 
phenomenon bandwidth.) 

If, on the other hand, the equipment was flat to 10 cycles, down about 



MEASUREMENT OF POWER SPECTRA 237 

6 db at 20 cycles, 15 db at 30 cycles, and 60 db at 50 cycles, we would 
not expect any irremovable aliasing difficulties, and would expect to be 
able to estimate the spectrum up to some moderate frequency — up to, 
say, 20 cycles, 30 cycles, or 40 cycles, depending upon how much back- 
ground noise was present. (The energy above 100 cycles would not be 
recorded.) 

In the next stage we should inquire into 

1. the frequency resolution required 

2. the fractional accuracy of estimation required, and 

3. the total duration of data available, and the number of pieces into 
which it falls. 

Items 1 and 3 can be combined and converted into the approximate 
number of elementary frequency bands (number of degrees of freedom 
— see Section 9 which is based on Sections 6 to 8) possibly available for 
each of the proposed estimates. This number can then be compared with 
the number of degrees of freedom required (also see Section 9) to give 
the desired fractional accuracy. If these are consistent, or if the desired 
accuracy, or the desired resolution, or both can be modified to make 
them consistent, then there is a good chance that the data can be per- 
suaded to yield the desired results, and further inquiry is indicated. If 
not, we should stop here. 

Explicit relations among duration, resolution, and fractional ac- 
curacy, the latter expressed in terms of 90 per cent interval (cp. Tables 
I and II), are given in Section B.23. These lead to an approximate 90 
per cent spread, expressed in db (decibels), of 

14 

\/(total duration in sees) (resolution in cps) — \ — I (number of pieces) 

a result which may often be conveniently used in such an inquiry. 
At the beginning of the third stage, information should be sought as to 

1. over what range of frequencies the spectrum is desired, and 

2. whether any lines or high and narrow peaks are to be expected, 
and at what frequencies. 

Guided by this information, it should be possible to decide whether either 
a. smoothing by groups (as in Section 17) to reduce computation 

without loss of low-frequency information, or 

1). rejection filtration (as in Section 10) to suppress well-established 

lines or high and narrow peaks, 

or both, are desirable. If desirable, they are then carried out before or 

during the next step. 

Unless advance information about the spectrum is exceedingly good, 
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a pilot analysis (see Section 18) to establish the rough form of the spec- 
trum will now be very much worthwhile. The result (or the very good 
advance information, if available) will now make it possible to choose a 
reasonable prewhitening procedure (or, possibly, to choose not to pre- 
whiten). Once suitable prewhitening (see Sections 11 and 15) has been 
chosen, and cither carried out or planned for, the third stage is complete. 
Finally, the information on resolution and accuracy combine to specify 
the width of spectral window desired, and hence (see Section 13) the 
number of lags for which mean lagged products should be calculated. 
When these are in hand, they are modified and transformed (or, perhaps 
more simply, transformed and convolved — see Section 13), adjusted to 
screen out very low frequencies, and the resulting power density esti- 
mates are corrected for the prewhitening, and for grouping and/or re- 
jection filtration (if any) used. The final estimates are best plotted on a 
logarithmic power scale, since their accuracy will be roughly constant 
on this scale. Crude confidence limits can then be calculated from the 
number of degrees of freedom (see Section 9) which would be present 
in the individual estimates if: (i) the process were Gaussian, and (ii) 
the pre whitened spectrum were flat. (The factor of safety of Section 
8 will ordinarily be adequate.) 

21. SAMPLE COMPUTING FORMULAS 

We cannot prescribe one set of computing formulas for general use, 
since there are rational reasons for different choices. All we can do is 
illustrate a procedure which may work fairly well in many cases. (And 
our example is not likely to be the only one with such properties. If the 
reader understands, by comparison with adjacent sections, just why we 
do what we do, he can compare other procedures with this example in a 
meaningful way. He will have to understand much of what is said in 
order to do this.) 

If X t , t = 0, 1, • ■ • , n are the given observations, which we will treat 
as if at unit spacing, it is likely that P A (f) decreases substantially as/ 
goes from 0.0 to 0.5 = f N . (If it does not, then aliasing is likely to have 
been serious, and satisfactory analysis at this spacing may be impossible.) 
Prewhitening by 

X t = X t - 0.6 X<_i 

which multiplies P A (f) by 

1.36 - 1.20 cos 2tt/, 
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a factor increasing from 0.16 to 2.56, may be a wise prewhitening. (The 
index t will now start at 1, and not at zero.) 
We calculate next 

c' r = — E x,x l+T - (± ± xX, 



n — r i \n 

namely mean lagged products with an adjustment for the mean. (Further 
adjustment for a linear trend might have been necessary. See Section 
19.) Let us suppose that we do this for /• = 0, 1,2, • • • , 24 = in. 
(Some other choice may have been appropriate.) 
Next we calculate the finite cosine series transform 

v r = \c + 2 x; c;-cos y± + c:- cos n 

and the results of harming (see Sections 5 and 13) 
C/o = *(7c + Vd 

U r - Wr-1 + W, + IVr+l , 1 ^ f ^ TO - 1, 

tf„ = |F m -i + |7« . 

These can then be corrected for both prewhitening and the correction 
for the mean by forming (see Section B.21) 

n 1 TJ 



1.36 — 1.20 cos 7^— 
bm 



' U T , 1 ^ r ^ m - 1, 



1.36 - 1.20 cos ~ 
2m 



1 



I.:k. - I /JO m- ( 1 - jp)2* 



as smoothed estimates of the power density. Estimates with subscript 
will apply in the range just above zero frequency, those with subscript r 
near a frequency of r/(2m) cycle per observation, and those with sub- 
script m in the range just below a frequency of 0.5 cycle per observa- 
tion. 

In interpreting these estimates four cautions are important: 

(a) aliasing of frequencies (see Section 12) may have taken place, 
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(b) the estimates are smoothed with a crudely isosceles triangular 
weighting function (see Sections 5 and 13) of full width 4/ (2m), 

(c) no estimate will be more stable than chi-square on (2n)/m degrees 
of freedom and, wherever the spectrum is not smooth, the stability of 
the estimates will be appreciably less (see Section 9), 

(d) adjacent estimates will not have independent sampling errors, 
though those not adjacent are at least very close to being uncorrelated. 

The units involved are such that the smoothed one-sided, aliased power 
density on 0.0 ^ / ^ 0.5 is approximated by twice the estimates. The 
pieces into which the variance would be divided, each coming from a 
frequency band of width l/(2m) cycles per observation, are estimated 
by l/(2ra) times the corrected estimates. 

Planning for Measurement 

Up to this point, with the exception of part of Section 11, our discus- 
sion has been concerned (i) with what happens when certain operations 
are performed, and hence (ii) with how we should make the best of 
what we already have. 

The third aspect — planning the measurements or observations to 
meet requirements — has not been adequately treated. (Both statis- 
ticians and engineers concerned with measurement will agree that this is 
the most vital aspect of all, but will, unfortunately, also have to admit 
that, all too often, "salvage" work will be required because this third 
aspect was omitted, and the observations made unwisely.) 

In discussing "What data shall we take?", "How shall we measure it?", 
the same considerations will recur as in discussing "How shall we analyze 
it?", but (i) they will be looked at from quite different aspects and (ii) 
they will be even more important. Now, by planning in advance of data- 
gathering, we may be able either to replace useless or difncult-to-analyze 
measurements by usable ones, or to avoid making measurements which 
could never provide the desired information. 

The first basic decision has to do with the type of recording and analy- 
sis to be used. Three types are in use today: 

(1) Spaced: Analog use of intermittent recorders (photography of 
situations or of dials, etc.) or digital recording at equally spaced inter- 
vals (electronic reading of dials, photography of counters, etc.). 

(2) Mixed: Continuous recording (on film, calibrated paper rolls, etc.) 
with the intention of analyzing equally spaced values to be "read" from 
these continuous records. 

(3) Continuous: Continuous recording (FM recording on magnetic 
tape, etc.) with the intention of making an analog analysis. 
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The choice among these types will depend on their particular advan- 
tages and disadvantages, and on the availability of equipment, both for 
recording and analysis. In almost every case, however, the detailed 
problems will be surprisingly similar. 

22. CHOICE OF FREQUENCY RESPONSE 

In each instance there will be a problem of the response of the ob- 
serving and transmitting or recording elements to high frequencies. 
When less quantitative studies are made, it is usual to worry whether 
the high-frequency response is large enough to "follow" the phenomena 
precisely. To be sure, if recording is only at intervals, and the needle is so 
blurred as not to be read, the high-frequency response may indeed be 
reduced by filtering. Such filtering is too likely to be regarded as un- 
fortunate rather than helpful. Effort tends always to be applied for 
"faithful" recording. This is appropriate for recording specific individual 
time histories for visual study, but is often most inappropriate for re- 
cording sample time histories for statistical study with the aid of sensitive 
Jitters (analog or digital). (When the recording is continuous, be it on film, 
oscillograph paper, or magnetic tape, the "writing" means has a limited 
frequency response, and this will usually help to keep the record from 
blurring.) 

When the analysis is to be made on equally spaced data, whether the 
recording be continuous or equi-spaced, there is a real problem of aliasing. 
And there is need for a basic choice of a frequency cutoff, usually in terms 
of two frequencies such that (i) the experiment is only concerned with 
frequencies up to the lower one, and (ii) frequencies beyond the upper 
one will not be recorded. The need for such a choice in a continuous 
system may not appear to be so acute, since only problems of noise or 
non-linear distortion are involved (see Section 11). Yet in practice, it 
will almost always be made — indirectly — by the choice of a writing 
speed (which implies a frequency cutoff for a continuous recorder). 
Economic pressures to reduce both the volume of record, and the extent 
of measurement and computation, act to lower the frequency cutoff, while 
desires to follow the spectrum to higher frequencies act to raise it. The 
proper choice comes from balancing these pressures. 

Sometimes in mixed systems, when continuously recorded data is to 
be subjected to equi-spaced analysis, an attempt is made to compromise 
matters by recording with a high cutoff, and then asking that the 
measurements of this record be "eye averages" over periods long enough 
for the record to show considerable variation. Such compromises do not 
seem to work nearly as well in practice as their proponents suppose. Re- 
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placing the "averages" by the results of "reading to the line" at equi- 
spaced points often seems to give better results, even though a smaller, 
but unknown amount of aliasing is thus replaced by a larger, known 
amount. Putting the filtering into the observing and writing equipment, 
rather than into the (human) measurer and transcriber, will usually do 
even better — better by a large margin. 

If one can be confident of the upper limit, beyond which the power 
spectrum will not be needed, it is usually best to record with a related 
frequency cutoff, thus reducing noise complications, aliasing difficulties, 
and the necessary bulk of the record. 

Conversely, however, points must be recorded or measured fre- 
quently enough (or a high-enough writing speed used) so that aliasing 
(or loss of high-frequency response) is not serious. (For a given maximum 
usable frequency, the sharper the cutoff, the less stringent this require- 
ment.) 

To summarize, the problems surrounding aliasing should lead to the 
choice of a frequency cutoff which is usefully described by two frequencies 
(which may reasonably be in the ratio of 1 to 2) : 

(a) a lower frequency, which is the highest at which important power 
spectrum estimates will be made, and 

(b) a higher frequency, at and above which no serious amount of re- 
cording is done. 

Both of these need to be chosen before settling finally on observing and 
recording equipment. If equi-spaced data is produced, the folding fre- 
quency may be as low as half-way between these two frequencies. 

A prime essential to keep in mind is that all measurement, transmis- 
sion, and analysis systems are essentially band-limited. It is always in- 
advisable to try to cover too many octaves of log frequency while using 
exactly the same techniques. 

23. DURATION OF DATA REQUIRED 

Instead of trying to compromise resolution and stability within the 
limitations of available data, we may now consider the costs and ad- 
vantages of getting still more data, or, perhaps, somewhat less data. 
We face a three-way compromise among effort, resolution, and stability 
(precision) of estimate. 

Effort has to be measured hi various ways, but the duration of initial 
record will almost certainly have to be considered as one measure. It is 
shown in Section B.23, where both precise definitions of the quantities, 
and a corresponding formula for the necessary numbers of pieces of a 
given length will also be found, that 
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1 200 , (pieces) 

, , , . . , x 2 (90% range in db) 2 3 

(total duration m seconds) = (resolution in cps) "" ' 

If, for example, a resolution of 0.1 cps is to be obtained from 6 pieces of 
record and is to furnish stability of ±2 db for (on the average) A of 
the individual estimates, then the necessary duration will be 

1 200 ,6 

= 150 seconds. 

0.1 

This applies equally to analog processing of continuous records or to 
digital processing of spaced records, so long as we apply the best methods 
which we know to a shape of spectrum which is not exceptionally diffi- 
cult to handle. 

24. AMOUNT OF DIGITAL DATA-HANDLING REQUIRED 

If spaced data are to be digitally processed, both the number of data 
points to be used and the number of multiplications involved are of 

interest. 

If we can easily build in the desirable frequency cutoff, and have to 
resolve a number of equally spaced bands spaced evenly from zero fre- 
quency to some maximum frequency, then we will require about 



_ -f + (pieces) (number of bands resolved) 

2 (90% range in db) 2 J 





(lata points and, roughly about 

[? + 18QQ + 3 (pieces) | (number of bands resolved) 2 

\2 (90% range in db) 2 F / 

muUiplications. 

These last two results often give only preliminary indications. Aliasing 
difficulties will increase these numbers. The possibility of smoothing by 
groups will decrease them. Details and possible modifications of the 
proposed system of data gathering and analysis need to be studied care- 
fully before final estimates of the number of data points and the rough 
number of multiplications are finally settled upon. 

25. QUALITY OF MEASUREMENT AND HANDLING 

In every case, careful consideration should be given to the quality of 
measurement and data handling required (in terms of the dangers of, 
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e.g.: time-varying frequency response, introduced noise, intermodulation 
distortion, etc.). An extensive catalog would be out of place here, since 
the problems are basically those of instrumentation engineering. But a 
few reminders may indicate the diversity of problems which might arise. 

A camera may be "clamped" to some object to record the relative 
orientation of that object and something visible to the camera. The 
mounting of the camera is never perfectly rigid, and vibrations will 
occur ordinarily at frequencies far above the data-taking rate. Whatever 
the frequency, these vibrations will introduce "noise" into the record. 
At least an order-of-magnitude calculation of the effects of likely vibra- 
tion is needed. 

Storage of a signal on magnetic tape will be a part of many measure- 
ment-analysis systems. Because only rough spectra are wanted, AM 
(amplitude modulation) recording may be planned. If the fact that AM 
recording and playback is subject to considerable fluctuation in over-all 
gain (db's, not tenths db) is neglected, measurement planning may be 
quite misleading. 

In a complex analysis, where several spectra and cross-spectra (whose 
analysis we have not specifically discussed) are involved, it might be 
planned to plot the estimates of each spectrum and cross-spectrum 
against frequency, draw smooth curves, and compute derived quantities 
from values read from these curves. Such a process has led to great 
difficulties in certain actual situations, because of the "noise" introduced 
by such visual smoothing which appears to have distinctive but unknown 
properties. Such a graphical step may appear to be good engineering, 
but it ('.'1111101 be high quality data handling. Its use may nullify the 
careful selection of other data processes, some of which are delicately 
balanced. 

Graphical analysis should ordinarily be reserved for: 

(a) display of whatever spectrum or function of spectra is really a 
final output, 

(b) description of the actual effects of computational procedures, and 

(c) trouble-shooting. 

26. EXAMPLE A 

Suppose first that the spectrum of some aspect of the angular tracking 
performance of a new radar is to be obtained; that angular tracking can 
only be studied by photographing the target with a camera clamped to 
the antenna; that frequencies near 0.27 cps are of special interest; that 
the spectrum of tracking performance at higher frequencies is relatively 
flat up to 10 cps and then falls rapidly enough to be negligible beyond 40 
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cps; that estimates at all frequencies up to 25 cps are desired; and that 
stability to ±1 db is derived. What are the requirements? 

The total amount of tracking required is fixed by the resolution re- 
quirement near 0.27 cps, which we may suppose to be either 0.05 cps or 
0.02 cps. These lead, respectively to durations of 

6 + 50 + 1) 55s > ]00 ° — ■"■• 



and 



i + 50 + |j ~ > 2500 seconds. 



Single stretches of either 16 or 40 minutes continuous tracking are al- 
most certain to be out of the question. The length of piece available 
would depend on the aspect of tracking performance studied, but a fair 
figure for this illustration might be 200 seconds. Going to Section B.23 
for the necessary formula, we find 



\ + 50 



(number of pieces) = r = jt-^z = 5.2 



or 



(200) (0.05) - | 



a + » 



(number of pieces) = - = jr-ss — 13.7. 

(200) (0.02) - ± *' D/ 

From a purely experimental point of view, these amounts of data are 
moderately hard to substantially hard to obtain, but we may suppose 
them available as far as radar and target availability are concerned. 

We come next to data taking and availability problems. We must 
study the spectrum up to 25 cps. Since the spectrum is negligible only 
above 40 cps, our folding frequency must be at least 32.5 cps, which 
would fold 40 cps exactly back to 25 cps. Hence we need at least 05 
frames a second. Consideration of available frame rates bring us to 64 
frames a second as probably reasonable. This is 12,800 frames in each 200 
second piece, a total film reading load of between 50 and 150 thousand 
frames. This will require some hundreds of man-days of film reading, 
but may perhaps be faced. 

To calculate directly the rough number of multiplications involved, we 
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may begin by assuming that we are going to require the 0.05 or 0.02 cps 
resolution all the way from to 25 cps. Were this the case, then we would 
require to resolve from 

to 

m - 1 > 250 

frequency bands. The corresponding numbers of multiplications range 
from 



to 



[4.5 + 450 + 3 (pieces)] (500) 2 « 120 million 



[4.5 4- 450 + 3 (pieces)] (1,250) 2 « 750 million. 



The running time of an IBM 650 calculator on such a problem is about 
10 hours per million multiplications, so that between 

1,200 hours = 30 shift- weeks 

and 

7,500 hours = 188 shift- weeks 

would be required. Clearly these machine times are out of line, and 
attention should be given to ways of reducing this aspect of effort. 

An application of smoothing by groups seems most likely to be effec- 
tive, especially since the high resolution is only wanted near the low fre- 
quency of 0.27 cps. Let us suppose that, in view of the supposed rather 
flat spectrum out to 10 cps, the engineers concerned will be content with 
two spectrum analyses, one with 0.5 cps resolution extending all the way 
to 25 cps, and the other with 0.02 cps resolution extending only to 1 cps. 
What effect will this have on the computational load? 

Notice first that it will have no effect on the radar-and-target operat- 
ing and film-reading loads. These were fixed by the resolution-precision 
requirements, and by the combination of this with the upper limit of the 
actual spectrum affecting the camera. Replanning details of the analysis 
will save nothing on either of these. 

The broad-frequency low-resolution analysis will resolve about 

25 

rr-= = 50 bands 

0.5 
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and require roughly 

[4.5 + 450 + 3(14)] '(SO) 1 = 1.24 million multiplications 

(since we shall need 14 pieces to obtain the required precision at a resolu- 
tion of 0.02 cps). This would require about 12.4 hours machine time, a 
quite reasonable amount. 

The preparation of data for the low-frequency high-resolution analysis 
— if we follow the suggestion of Section 17, requires less than 1.5 
additions per original frame, since each datum contributes to four means. 
This is at most 0.2 million additions and can probably be combined with 
the next step so as not to involve substantial machine time. 

The conduct of the low-frequency high-resolution analysis will resolve 
about 

m = 30 bands 

and will require about another 12.4 hours of machine time. 

Thus we have reduced machine time to about 25-30 hours, in pleasant 
contrast with the remaining requirements of some hundreds of hours of 
film reading and 14 test runs of 200 seconds each. The balance is ap- 
proximately restored. 

Our apparently blind use of the multiplications-required formula has 
concealed one important point. Our calculation of the time required for 
the high-frequency low-resolution analysis tacitly assumed that we have 
processed no more of the data than is required to meet the actual resolu- 
tion-precision requirement. 

The loosening of resolution from 0.02 cps to 0.5 cps in this part of the 
analysis has reduced by a factor 25 the amount of data which must be 
processed to meet the ±1 db (90 per cent) requirement. Hence the two 
hours machine time is predicated on processing only -Jgth of the available 
data. If only about -jV of the data is to be processed for the high frequency 
analysis, then it will be desirable to take the most typical 8 or 10 sec- 
onds from each piece. The losses due to end effects will be somewhat 
greater, it is true, but the advantages of increased coverage of the effects 
of unplanned variation, consequent on using parts of all 14 runs, far 
outweigh such considerations. 

It would be possible to use only one run for the high-frequency analy- 
sis, a possibility which emphasizes the fact that |fths of the film reading 
is done to obtain the raw material for averaging, for filtering out high 
frequencies. If the hundreds of man-days of film reading look out of 
line, mid if the line from the radar to the target is known not to change 
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rapidly (with respect to an inertial frame of reference), then we are driven 
to consider whether the "clamping" of the camera to the antenna could 
modified in such a way as to provide a frequency cutoff between antenna 
position and camera position. What would be desired would be a reliable 
mechanical filter with a cutoff at 1 or 2 cps, and substantial, reproducible 
transmission up to, say, 0.5 cps. If such a mount could be taken down 
from the shelf, then it would suffice to make (a) one 200-second run with 
a stiff mount and 04 frames per second, and, say, (b) thirteen 200-second 
runs with a mount of such designed softness, and, say 4 frames per sec- 
ond. The total number of frames for reading would now be 12,800 for 
run (a) and 800 for each run (b), a total of about 23,000 frames. This 
might require about a man-month to read, a saving of several man- 
months. Unfortunately, such a sharply-tuned low-pass mount would not 
be likely to be on the shelf. 

27. EXAMPLE B 

As a second example, suppose a new solid-state device develops a noise 
voltage with a power spectrum roughly proportional to 1// when under 
test under most extreme circumstances — circumstances so extreme that 
its average life is 30 to 50 milliseconds, and suppose that the detailed 
behaviour of this spectrum is believed likely to provide a clue to the 
proper theoretical treatment of some of the properties of this device. 
Suppose further that, while it was believed that the shape of the spec- 
trum of the noise from different examples of this device was the same, 
the voltage levels of different devices were quite different. It might be 
reasonable to ask for spectral measurements to ±0.25 db resolving 1 cps 
and covering from 1 cps to 500 cps. Direct measurements are likely to be 
most difficult, for the power between 499 and 500 cps is about , 2U ) 00() th 
the power between 1 and 2 cps, a difference of 51 db in level. Our re- 
cording and processing equipment is not likely to have the dynamic range 
required for direct analysis. 

Clearly we should prewhiten our noise as early in the measurement and 
analysis system as we reasonably can. Fortunately, prewhitening here is 




Fig. 12 — RL voltage divider. 
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operationally simple. A RL voltage divider, as indicated in Fig. 12 will 
introduce an attenuation of voltage, if the load impedance is high, 
amounting to 

R + jwL ' . . R 1 
juL 

If the original spectrum were 



«i + -in- 

brLr 



f- o>- 

then the pre whitened spectrum would be 

4tt 2 A 
i,i- 4ltt'AL~ 



1 "T •> r •> 



which will be initially constant, and then decrease 6 db per octave, with 
a corner at « c = R/L,f c = R/2irL. As a first step in measuring a spectrum 
out, to say, / = 2R/2irL, at which frequency the prewhitened spectrum 
would be down about 7 db, such a change would be useful. The range of 
frequencies which could be usefully studied would not be appreciably re- 
duced by such a change, even though the low frequency power level 
would be greatly reduced by the prcwhitening network, since the low- 
frequency power level would not be seriously reduced below the former 
power level at the corner frequency. If one could have been studied, 
the other can be studied. 

28. EXAMPLE C 

The irregularities in the earth's rotation have been studied by Brou- 
wer, 17 who reduced the available observations (times of occultation and 
meridian passage) by averaging over individual years. He states "oc- 
cultations so reduced in recent years have been demonstrated to yield 
annual means essentially free from systematic errors if the observations 
are well distributed over the year. . . . The 5's may themselves be the 
accumulations of numerous smaller random changes with average inter- 
vals much smaller than a year. The astronomical evidence throws no 
further light on this, though perhaps something may be gained by an 
analysis of residuals in the moon's mean longitude taken by lunations." 
These comments suggest that astronomical data can supply values once 
a year, possibly no more frequently, and may be able to supply values 
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about 13 times a year (once per lunation), certainly no more frequently. 
Let us accept the first possibility as a basis for an example. (This is the 
best example we know of a situation where equally spaced data cannot, 
in principle, be had at a finer spacing.) 

The information most nearly directly supplied by the astronomical 
observations is At, the difference between ephemeris time and mean solar 
time. Brouwer discusses two statistical models for its structure, both of 
which are most easily described in terms of the behavior of the second 
differences of the observations. In the first, the true second differences 
are constant over periods of varying length. In the second model, the 
true second differences are independently and randomly distributed. In 
either case, observational errors, independent from observation-period 
to observation-period also contribute to the observed A/'s. 

If we were to plan an observational program to decide between these 
hypotheses by spectral analysis we need first to specify the alternative 
spectra. The first model seems never to have been made as precise sta- 
tistically as the second. Brouwer's fitted curves correspond to constancy 
over periods of from 4 to 15 years. We should like to get a general idea 
of the possible spectra corresponding to this model without making the 
model too specific. Consider first a situation in which, except for the 
effects of second differences of experimental errors, the observations are 
constant in blocks of five, and where the values assigned to different 
blocks are independent. The successive average lagged products (start- 
ing with lag zero) are proportional to 5, 4, 3, 2, 1, 0, 0, 0, . . . and it fol- 
lows that the power density is proportional to 

1 + - cos vf/f N + ■= cos 2irf/f N + - cos 3x///at + =■ cos A.irf/f N . 
5 5 5 5 

Calculation shows that this is high near zero frequency, falling rapidly 
until, beyond about f/f N = 0.3, it consists of ripples with an average 
height of less than ^Vth the low frequency peak. If, instead of "constant 
by fives", the specific model were "constant by eights" or "constant by 
tens", still with independence between blocks, this peaking would be 
more pronounced and confined to still lower frequencies. If the lengths 
of the blocks were to vary at random, according to some distribution, 
still with independence of value, the spectrum would be the correspond- 
ing average of such spectra for fixed block lengths. The spectrum to be 
expected for second differences of annual average observations then should 
consist of a sum of two components: 

(1) a "true" component peaked at low frequencies, falling rapidly by, 
say, f/f N = 0.2 or 0.25, and continuing to///* = 1-00 with an average 
height perhaps 1 per cent or 2 per cent of the low-frequency value, 
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(2) an "observational error" component, corresponding to inde- 
pendent errors in the annual averages, and hence proportional to (1 — 
cos (7r//Xv)) 2 . 

In case the second model should apply, the first component would be 
replaced by one with a flat density. 

Fig. 13 shows the shapes of the three possible components. The natural 
way to try to distinguish between the two models by spectral analysis 
is to compare the spectral density in the middle range, say f/f N = 0.25 
to 0.5 with that in a lower range, say below f/f N = 0.25. According to 
Model I, the low-range density should be substantially higher than the 
middle-range density, the latter consisting of the effects of observational 
error (whose strength can be well estimated at the upper end of the spec- 
trum). According to Model II, the middle-range density should be 
slightly to somewhat greater than the low-range density, the increment 
representing effects of observational error. 

Without more detailed estimates of the relative sizes of the compo- 
nents, it would be difficult to specify exactly how many observations 
would be required to separate Model I from Model II, but 10 to 20 
degrees of freedom in each of the ranges discussed should be quite help- 
ful. This suggests 100 values of annual second differences, corresponding 
to 102 years of careful astronomy, as likely to be helpful. Since Brouwer 
gives annual values for 131 years, some 129 annual second differences are 
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Fig. 13 — Components for two models of earth-rotation irregularities: (1) "true 
irregularity" component for first model, (2) "observational error" component for 
either model, (3) "true irregularity" component for second model. 
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available for trial, and it may be possible to answer the question without 
waiting for many more years to pass. 

It might well suffice to estimate smoothed densities over octaves such 
as 0.0625 ^ f/f N ^ 0.125, 0.125 ^ f/f„ ^ 0.25, 0.25 ^ f/f N g 0.5 and 
0.5 S J/Jn ^ 1. Thus we might consider using the add-and-sub tract 
pilot estimation method for initial exploration. The actual analysis of 
Brouwer's data is considered further in Section B.28. 

Appendix A 

FUNDAMENTAL FOURIER TECHNIQUES 

In this appendix we review briefly certain aspects of Fourier transfor- 
mation. These aspects may be regarded as dealing mainly with diffrac- 
tion by slits, rectangular or graded, and by analogs made up of discrete 
"lines". Convolution and the so-called Dirac functions are specially 
important as convenient tools. Some parts of the discussion will have no 
direct bearing on the analysis of procedures for power spectrum estima- 
tion, but are intended to familiarize the reader with analytical tools 
which are used frequently throughout the remainder of this paper, and 
which may be used to advantage in many other analyses of a similar 
nature. 

A.i Fourier Transformation 

There are several formulations of Fourier transformation which differ 
according to custom, convenience, or taste. The formulation which we 
will adopt here is the one used by Campbell and Foster. 19 Given a func- 
tion of time, G(t), its Fourier transform is a function of frequency, and is 
given by the formula 

S(f) = f G{t)-e~ iut dt (u> = 2t/). 

J— oo 

Conversely, given a function of frequency, S(f), its Fourier transform is 
a function of time, and is given by the formula 

G(t) = f S(f)-e ial df («- 2t/). 

J— 00 

The term "frequency" is used here, not in the probability or statistical 
sense, but in the sense of sinusoidal or cisoidal functions of time (cos at, 
sin ait, e ). 

Our preference for the Campbell-Foster formulation is based on the 
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following points, arranged approximately in the order of increasing 
weight. 

1. Frequencies are expressed in cycles per second more naturally and 
much more frequently than in radians per second. (In our analysis we 
use w only as an abbreviation of 2wf, and only if it is typographically 
convenient.) 

2. Except for the sign of the exponent in the kernels, the transforma- 
tion formulae are symmetrical. The assignment of the signs here is the 
conventional one in transmission theory. 

3. In most of the applications to communications problems, the fre- 
quency functions are rational functions of p = ua, with real coefficients. 
Hence, the reformulation of the transformation of S(f) to G(t) as 



2irl J- ,» \2irl I 



is a natural and convenient step in the calculation of the integral by the 
method of residues. 

4. The transformation formulae correspond to the conventional rela- 
tions between the impulse response (response due to a unit impulse ap- 
plied at t = 0) and the transfer function (ratio of steady-state response to 
excitation, for the complex excitation e""") of a fixed linear transmission 
network. These network functional relations are commonly regarded as 
Laplace transformations rather than Fourier transformations. As a 
matter of fact, however, the circumstances in almost all practical appli- 
cations are such that there is no essential difference between Laplace 
transformations and Fourier transformations. Impulse responses are 
zero for / < and vanish exponentially as t — » « , and transfer functions 
are analytic on and to the right of the imaginary axis (including the 
point at infinity) in the complex p-plane. On the very rare occasions 
when a communications engineer might be interested in the behavior of a 
network under energetic initial conditions, he has ways of introducing the 
initial conditions without using Laplace transforms (Guillemin ). 

It should be noted that, since G(t) must be a real function, the real 
part of S(f) must be an even function, and the imaginary part of S(f) 
must be an odd function. The even part of G(t) and the even (real) part 
of S(f) are cosine-transforms of each other. The odd part of G(t) and the 
odd (imaginary) part of S(f) are negative sine-transforms of each other. 
It should be noted also that if G(t) and S(J) constitute a transform-pair, 
then G( — t) and S(—f) also constitute a transform-pair. Further, S(—f) 
is equal to S*(J), the complex conjugate of S(f). 
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a. 2 Some Transform-Pairs 

We will now turn our attention to some transform-pairs which we will 
require directly or indirectly in the analysis of procedures for power spec- 
trum estimation. We will use special symbols for some of these trans- 
form-pairs. For later reference, these transform-pairs will be collected in 
Table IV. 

The first transform-pair, which is easily worked out, involves a sym- 
metrical rectangular time function (box car of length 27',,,), viz. 



Do(0 = 1, 
_ i 

— 2> 

= o, 



t I < T m , 
1 1 = T m , 
1 1 > T m . 







Table IV 




1. 


Dob) -i,|* 

1 , 

- , | t 


< T m 
= T m 
> T m 


,, , ., „, sin uT m 
Qoif) = 2T m —^ 

wi m 

= 2r m dif 2fT m 


2. 


D(t) -i-^rj. 
= o, | 


t\ ^ T m 


= 7 1 ™ (dif/TJ 1 


3. 


50 - to) 


e-»'o 


4. 


COS tout 


i [«(/ +/ ) + «(/ -/.)] 


5. 


v„>(t;At) = - s(f + mu) 

m 

q=m-\ 

+ At- 2 8(1 - qAt) 
A— m+1 

+ - 8(t - mAt) 


Qo(/ 


A0 

wA< . 

A<-cot sin TWwAi 

2 

, dif 2f(m-At) 

2(m-AL) cos (t/-A/) — — — 

dif/-A< 


6. 


q=oo 

V(t ; At) = At- 2 

q=— oo 


l(t - qAl) 


il 


y At 1 „=-« \ At ' 


7. 


A (t ; At) 


'(/-a 
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The corresponding frequency function is 

Qo(f) = 2T m ^^ = 27V dif 2fT m . 

(1)1 m 

(The values assigned to D (t) at the end points \ t\ = T„ are those re- 
sulting from the transformation of Qo(f) to D (t). Of course the values 
assigned at these two points do not influence the result of the transforma- 
tion of D (t) to Qo(f))- Except for scale factors, this frequency function 
is the function dif u = sin mi/mi which recurs constantly in this subject. 
It is often convenient to regard it as the diffraction pattern (in frequency) 
due to passage through a rectangular slot (in time). The behaviour of 
dif 2fT m is shown in Fig. 14. 

The second transform-pair, which is almost as readily worked out as 
the first, involves a symmetrical triangular time function, viz. 

2)1(0 = * ~tI' '*' - Tm ' 

The corresponding frequency function is 

Qi(f) = T m fegr^ Y - r.(dif fTj*. 

Except for scale factors, this frequency function behaves as shown in 
Fig. 14. 

The third transform-pair involves a so-called Dirac function as the 
time function. The Dirac function is not a function in the strict mathe- 
matical sense. It is called a "measure" by L. Schwartz. 21 For our pur- 
poses, it will only be necessary to identify 8(t — to)-dt formally with 
dh(t — to) where h(t — ^0) is Heaviside's unit-step function, viz., 

hit - to) = 0, t < U 
= 1, t > to 

and to interpret all integrals as Stieltjes integrals. Hence if the time 
function (to use the term loosely) is 

G(t) = 8(t - to) 

then, the corresponding frequency function is 

S(f) = e' i0 "°. 

It should be noted that while 8(t — to) is easily formally transformed 
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into a frequency function, the latter is not so readily transformed into 
the original time function. 

The fourth transform-pair involves a symmetrical pair of Dirac func- 
tions as the frequency function. Thus, the time function 



G{t) = cos o) t 



(coo = 27r/o) 



corresponds to the frequency function 



S(fl = ![«(/ + /o)+«(/-/o)]. 

If the reader is disturbed over the fact that we are evidently going to 
base our analysis, at least initially, on the use of Dirac functions, he 
should note that Dirac functions are always paired with functions which 
are used widely and freely in transmission theory although they are not 
realistic in a physical sense. Functions of time, such as cos wot, which 
represent an infinitely long past and future history of activity, are not a 
bit more realistic in a physical sense than are "infinitely sharp" lines in 
the frequency spectrum. Similarly, functions of frequency, such as 
exp(— iuto), whose absolute values do not vanish as/— > «', are not a bit 
more realistic than impulsive "functions" of time. Nevertheless, as we 
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will see later on, these unrealistic pairs may be used as convenient bases 
for a wide variety of realistic pairs. They thus serve a very useful purpose. 
The fifth transform-pair involves a finite Dirac comb as the time func- 
tion, viz. 

V m (t; At) = ^ 8{t + mAt) + At- T, 8{t - qAt) + ^8(t- mAt). 

2 q=m+l ~J 

This is clearly a discrete approximation to D (t) for T m = m-At. The 
corresponding frequency function, which is easily worked out with the 
help of the third transform-pair (summing the exponential terms before 
introducing trigonometric equivalents), is 

coAt , -, / , . .\ dif 2f(m ■ At) 

Q„(f;At) = At cot — -smmwAt = 2(m-At) cos (wf-At) ' — . 

Except for a scale factor, the initial behaviour of this frequency function 
is illustrated in Fig. 9. Clearly, since cos = dif = 1, the limit of 
Qo(/; At), when At -» with m-At = T m held constant, is Q a (f). This 
corresponds to the formal convergence of V m (i; At) to A(0- 

We have denned this finite Dirac comb with a half-sized Dirac func- 
tion at each end because the corresponding frequency function has 
smaller side lobes, relative to the main lobe, than for the finite Dirac 
comb with a whole Dirac function at each end. This is easily seen from 
the fact that the effect of adding a further half-sized Dirac function at 
each end of V m (t; At) is to add At -cos muAt to QoC f ; ^)- 

The frequency function Q (/; At) is periodic, with a period of 1/Ai cps. 
It is symmetrical about everj integral multiple of \/{2At) cps. Thus, it 
has an absolutely maximum value of 2m- At at the integral multiples of 
1/AZ cps. It is zero at the integral multiples of l/(2mAt) cps which are 
not integral multiples of 1/At cps. For large values of m and small values 
of coAt, it behaves approximately like Qo(f)- 

The sixth transform-pair involves an infinite Dirac comb in time, and, 
as it turns out, also an infinite Dirac comb in frequency. The time func- 
tion is the formal limit of V m (t; At) as m — > », namely, 

g=oo 

V(<; At) = At- £ 8(t - qAt). 

q=— oo 

The corresponding frequency function is 



*«*a«- 



a-£'('-f)— (*£)■ 
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This may be surmised from the fact that 

Qo(/; At) df = 1 

J-1/(2A() 



for any m 



while 



lim [ Q (f; dt) df = - Urn Si(2*meM), (where Si(x) = f ^^ dy 



>0 y 



= 1 for any e in < e < 



2At 



The result may indeed be obtained by applying the fourth transform- 
pair with T m — m-At to the formal Fourier series representation of the 
infinite comb 



V(t; At) = 1 + 2 £ cos ?^ 

5=1 At 



Since 



V m (t;At) = Do(t)-V(t;At) 
we also have, as we shall see in the next section, 



Q (f; At) = Q (f) * A (f; i) 



The seventh transform-pair arises from the sixth by dividing by At 
on both sides. 

a. 3 Convolution 

If G(t) = Gi(t)-G2(t), then the Fourier transform of G(t) may be ex- 
pressed in terms of those of Gi(t) and Gi{t) as follows. 



S(f) = f G 1 (t)-G 2 (t)-e-' u, dt, 

J— oo 

= /"°Gi(/)-rr5,(f)-6 a ' f< d6 

v— 00 J— CO 

-£[£Oi(d-«"* Cf - |>l «B 

•/—CO 
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This relation, in which &(/) and &(/) are interchangeable, is commonly 
expressed in the symbolic form 

S(f) = S 1 {f) * S 2 (f). 

The implied operation on &(/) and S 2 {f) is called a convolution. In par- 
ticular, S(f) is said to be the convolution of Si(f) with S 2 (f). 
Similarly, if S(f) = &(/)■&(/), then 



G(i) = f Gxit - X)-G 2 (X)dX 

v — CO 



= Gi(0 * G»(0. 

Thus, multiplication and convolution constitute an operational trans- 
form-pair. 

(Convolution is often called by a variety of names such as Superposi- 
tion theorem, Faltungsintegral, Green's theorem, Duhamel's theorem, 
BoreFs theorem, and Boltzmann-Hopkinson theorem.) 

It may be noted in the detailed derivation above (putting / = 0), 
that 

f Gi{t)-CM)dt = f M &*(/)•&(/)•# 

J— 00 J— oo 

where &*(/) is the complex conjugate of Si(f). This is Parseval's theorem 
of which a very useful special case is 



f [G(t)f dt= [ M \ 8(f) l 2 df. 



An example of convolution is supplied by the symmetrical triangular 
time function in the second transform-pair. This time function is the 
convolution of two symmetrical rectangular time functions from the 
first transform-pair, with appropriate scalar adjustments. Another 
example is the infinite Dirac comb V(t; At), which may be regarded as 
the convolution of the finite Dirac comb V m (/; At) with the infinite 
Dirac comb A(t; 2m At), that is 

V(t; At) = V ,„(/.; At) * A(t; 2mAt). 

As the reader may easily verify, this corresponds to 

Convolution of time functions occurs in communications systems when- 
ever a signal is transmitted through a fixed linear network. If the input 
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signal is Gi(t), and if the impulse response of the network is W(t), then 
the output signal is* 



G(t) = f W(t - X)-Gi(X)dX 

J — co 



= W(t) * G 1 (t). 

The so-called linear distortion of the signal due to transmission through 
the network can be (and occasionally is) examined in terms of the effects 
of convolution, but the common practice among circuit engineers is to 
conduct the examination in terms of the corresponding frequency func- 
tions. There are good reasons for this common practice. The most im- 
portant of these reasons are: 

1. The relation between the frequency functions is simpler, viz. 

8(f) = K(/).&(/) 

where Y(f) is the transfer function of the network. 

2. The effects of amplitude distortion of the signal and of phase dis- 
tortion (of the unmodulated signal) may be examined independently. 
While phase distortion is critical in the transmission of pictures (fac- 
simile), it is relatively unimportant in the transmission of speech or 
music. 

3. The transmission characteristics of fixed linear networks are most 
easily calculated or measured accurately in terms of frequency rather 
than time. 

4. Fixed linear network design techniques based on frequency func- 
tions are today much further developed (simpler, more powerful, and 
more versatile) than those based on time functions. 

Convolution of frequency functions occurs in communications systems 
whenever a carrier wave is amplitude-modulated by a signal. If the input 
signal is Gi(t), and if the carrier wave is cos w t, then the output signal, 
with suppressed carrier, is 

G(t) = Gi(0 -cos corf 



* It may be of some help here to think of X as "excitation time", and of I as 
"response time". In the equivalent formulation 

G(t) = [ W(t)-Gi(1 - t) dr 



-L 



we may think of r = t — X as the "age" of input data at response time. 

At this point attention is called to a device which will be used many times to 
simplify analysis, which is to use — » and + » as limits of integration, letting the 
integrand take care of the effective range of integration. In this case, if (?i(X) = 
for X < to , and W(t) = for t < 0, the effective range of integration would be 
to <\ < t orO < t < t - to . 
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and the relation among the corresponding frequency functions is 

8{f) = &(/) * I [*(/ + /o) + *(/ - /<■)] 

= |Sx(/+/ ) + Mi(/-/o)- 

The convolution of frequency functions corresponding to the amplitude- 
modulation of a carrier wave is so naturally visualized simply as shifting 
the signal spectrum (frequency function) that it is almost never vis- 
ualized in any other way. It should be observed, however, that this 
point of view depends critically upon the two-sided specification of the 
signal spectrum, in amplitude and phase, to give the correct picture of 
the sidebands, whether the amplitude-modulation scheme under con- 
sideration be double-sideband, single sideband, vestigial sideband, or 
two-phase (as in TV chrominance signals). Further, the two-sided specifi- 
cation of the modulated-carrier spectrum is essential for a correct picture 
of the demodulation process used to recover the signal. 

For present purposes we will be interested in convolution not only as 
a tool for the synthesis of new transform-pairs but also as an analytical 
tool. For example, by regarding a time function G{t) as the product of 
two other time functions Gi(t) and G 2 (t) we can make use of the re- 
lation S(f) = <S'i(/) * S 2 (f) to reach insights about S(f) which do not 
come easily from the explicit form of S(f). 

To make convolution a useful analytical tool, we have to visualize it 
in some convenient way. This may be done in three ways. The relative 
merits of these three points of view depend upon the circumstances in 
any particular case. 

In the first place, convolution may be visualized as a stretching process. 
For example, in the equation 



(7(0 = f Gi(t - \)-G,(\)d\ 



we visualize G->(X)-d\ as a rectangular element of G 2 (t), originally con- 
centrated at t = X. This rectangular element is then stretched into the 
area under the elementary curve G\{t — X)-G->{\)-d\ regarded as a func- 
tion of /. This elementary curve has the shape of Gi(t) with origin shifted 
to / = X. The total effect at any particular value of t is then obtamed 
by integration over X. In this example, we have regarded G\(t) as the 
"stretcher" operating on each element of G»(t). Of course, since convolu- 
tion is commutative, we may interchange the roles of the two functions. 
In the second place, if one of the functions in the convolution consists 
exclusively of Dirac functions, each Dirac function may be regarded as 
a "shifter" operating on the other function in the convolution. For ex- 
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ample, 

8(t - a) * (7(0 = f 8(t - a - X)-G(X) dX = G(t - a). 

J— 00 

In the third place, convolution may be visualized as a weighted inte- 
gration with a moving weight function. For example, in the equation 



G(t) = C Gi(t - \)GM 



d\ 



we regard (7(0 as the integral of G 2 (X) with weight function Gi(t — X). 
The position of the weight function with respect to the X scale depends 
upon the value of t. In the event that the weight function has unit area, 
(7(0 may be regarded as the moving weighted average of (? 2 (X). (As 
previously noted, the roles of the two functions may be interchanged.) 

As an example of the use of the ideas described above, let us assume 
that we have a function G (t) which is zero outside of the interval 
< t < T, and for which the frequency function is S a (f). Let us gen- 
erate a periodic function G(t) by convolving G (t) with A {I; T). Then, 
since 

G{i) = G {t) *A(t;T) 
the frequency function corresponding to G(t) is, from Item 7 of Table IV, 

8(f) =So(/)-v(/;i). 

As we expect, S(f) consists of "lines" (of infinite height but finite area) 
at uniform intervals of \/T cps. The complex intensities (areas) of these 
lines represent the amplitudes and relative phases of the terms in the 
conventional Fourier series representation of G(i). Thus, 



cm = j_l sif) ' eiw ' df 
_ i «■ 

1 o=— 



q=oo / \ 

S0 \f)' 6 

As a second example, which is in a sense the dual of the first, let us 
assume that we have a function G (t) for which the frequency function 
S (f) is zero outside of the band -/o < / < /o . Let us generate a discrete 
time series (7(0 by sampling (7 (0 at uniform intervals of l/(2/ ) seconds. 
If we regard sampling as a multiplication by (or as amplitude-modula- 
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lion of) an infinite Dime eomb, then 

G(t) =G«(t)-A(t;±y 

Hence, the frequency function corresponding to G(t) is 

S(f) - Sotf) * V(/; 2/o), 
or, explicitly, 

S(f) = 2/o ■ £ So(f-2qf ). 

5=— oo 

If this frequency function is multiplied by a frequency function 81(f), 
where 

&(/) = -U l/l </• 

4/0 

= 0, |/|>/o 

it will revert to S (f). Thus, 

Sx(f)-8{f) = &(/). 
Hence, if GVO is the time function corresponding to Si(f), namely, 

then 

(?i(0 * <?(0 = Go(0- 

Thus, sampling G„(0 to get the discrete time series G(t), and convolving 
G(l) with Gi(/), restores G (t) exactly. This result reflects the well-known 
sampling theorem in information theory. The effect of sampling G {t) at 
uniform intervals of other than l/(2/ ) seconds is readily visualized. 

a. 4 Windows 

If a time function is even (and of course real), the corresponding fre- 
quency function is real (and of course even), and conversely. These cir- 
cumstances will prevail when we deal with autocovariance functions, 
power spectra, and appropriate weight functions. Under these circum- 
stances, the weight functions will be called windows. Such windows will 
be considered in transform-pairs, and the members of any pair will be 
distinguished as the lag window, and the spectral window. 
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Time windows convolved with periodic functions of time have been 
used by Guillemin," under the name "scanning functions", to examine 
the behavior of weighted partial sums of Fourier series. We use them in 
Sections B.4 and B.10 where we call them data windows, and their 
Fourier transforms (which may be complex) frequency windows. 

a. 5 Realistic Pairs from Unrealistic Pairs 

Transform-pairs which involve Dirac functions are very easily con- 
verted into a wide variety of realistic pairs. As an example, let us con- 
sider the sixth pair (infinite Dirac combs) which requires two convolu- 
tions for conversion to a realistic pair. If we convolve the time functions 
of the first and sixth pairs, taking T m <£ At, we get a time function 
which represents an infinite tram of narrow rectangular pulses of unit 
height. The corresponding frequency function still consists of Dirac 
functions but these now do not have a uniform intensity. If we next 
multiply the time function of this pair by the time function of the first 
pair, taking T m » At, we get a time function which represents a long but 
finite train of narrow rectangular pulses. The corresponding frequency 
function is continuous and consists chiefly of very narrow peaks of finite 
height approaching zero as / — > °o . 

A sinusoidal carrier wave of finite though great length may be repre- 
sented as the product of the time functions of the first and fourth pairs 
with T m » l//o • The corresponding frequency function is continuous 
and consists of very narrow peaks at ±/o , with much lower subsidiary 
peaks of height approaching zero as / — > « . 

If the time function of the third pah' is convolved with the time func- 
tion 

G(t) = t < 

= l f e~" T t>0, 
the resultant frequency function is 
S(f) = 



1 + iuT 



of which the absolute value falls off asymptotically like 1// as / — > °° , 
however small jT(>0) might be. 

In line with this discussion, it should be noted that a realistic "white 
noise" spectrum must be effectively band-limited by an asymptotic fall- 
off at least as fast as 1 // . Under certain circumstances, however, we may 
assume that the spectrum is flat to any frequency. Let us suppose that 
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the spectrum is in fact 

a" 

P(f) = T/ ' , f P(f) df = a 2 

where a is the variance. The autocovariance is 

C( T ) = a- 2 -e~ Welrl . 

If we transmit this noise through a network with an effective cutoff fre- 
quency well below /«, , we may assume for an approximation that 



and, therefore, that 



P(f) « °-j 

irfc 



TTjc 



although such an assumption is unrealistic if carried to indefinitely high 
frequencies (the input noise would have infinite variance). Hence, if the 
impulse response of the network is W(t), the autocovariance of the out- 
put noise is 

Coutfo - U) = ave \j_ W{ Ti )X(U - n) dn 

■ j_ WirJ-Xitj - r 2 )dr 2 

00 



2 etc 

C 



-^ f W(n) -Win- U + ti)dri. 

In particular, the variance of the output noise is 
C out (0) ^4 [" Wirt)] 2 dn 
which by Parseval's theorem is equivalent to 

Cout(o)«4 fVtf)l 2 # 

■njc J-=o 
where ]'(/) is the transfer function of the network. These results are 
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realistic. (The variance of the output noise is finite and approximately 
correct). 

a. 6 Some Trigonometric Identities 

In this section we develop some trigonometric identities which will be 
needed later on. We start with the equation 

2^ cos {\p + 2hu) = — : cos W + (b - a)u] 

- a sm u 

which is easily obtained by substituting 

ix I . — ix 

e + e 
cos x = _ 

in the left-hand member, summing the exponential terms and making 
some elementary trigonometric substitutions. By substituting ip ± t/2 
for yj/ we then get 

v-» . / . . <-»! \ sin (a + & + l)u . r . . „ >. , 

X sin (\p + 2hu) = — -. — sin [^ + (6 - a)u\. 

- a sin u 

Now, setting u = rf, and using the function introduced in Section A.2, 

sin pu __ (sin pirf)/pirf _ dif pf 
p sin u (sin irf )/irf dif / 

which, on differentiation, yields 

d (dif p/) _ /dif P f\ ( chfjtf _ dif A 
df (dif/) " V dif // V dif pf dif / ) • 

Before we rewrite our summation formulas in terms of such ratios of 
"dif" functions, we need to appreciate their behavior. For p not very 
small, (dif p/)/(dif /) behaves much like the numerator for pf small 
and moderate. The effect of the denominator is to force symmetry around 
integer multiples of £, so that the peak at / = is repeated at / = 1,2, 
3, • • ■ , thus making its behavior consistent with aliasing. For ^ / ^ £ 
its other effects are minor, since in this range (2/7r) ^ dif / ^ 1, while 
the extrema of dif pf have shrunk from +1 to ±2/(pir). For most con- 
siderations, therefore, we can approximate this ratio by the numerator. 

We now rewrite our summations as means, introducing (dif p/)/(dif /), 
finding 

— I— - ± cos (* + 2Arf) = dif(a lt +1)/ «■ [* + (6 - a)«f) 
a + b + 1 - a cat f 

t sin (* + 2M) = dif(a ^ +1)/ sin [* + <J> - a)*fl. 



a + b + 1 ^ dif / 
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Differentiating with respect to /, and multiplying through by 

-(a + 6 + l)/(2x), 

we get 

f,. ,, , 9J -v /dif (a + 6 + 1)A 
L * sin (, + 2h,f) = { ^ J 

tr_£ ( + h + 1) sin ty + (b - a)*f\ 

'(a + b + l) 2 dig (a + 6 + 1)/ 
2tt dif (a + 6 + 1)/ 

a + 6 + 1 dif' A r/ _i_ n \ /i 
with a similar formula for 

6 

2^ cos (^ + 2/itt/). 

—a 

We shall now use these formulas to obtain results about the average 
values of certain quadratic functions of chance variables X , Xi , • • • , 
X„ . The average value of any such quadratic function can be repre- 
sented in terms of a corresponding spectral window Q(f) in the form 

r Q(f)-2P(f)df 

Jo 
whenever 

ave {X t X t+q \ = f cos 2*qf-2P(f)df 

Jo 

for all suitable integers t and q, since the quadratic function can be 
expressed as a sum of multiples of terms of the form X t X t + q . To deter- 
mine the height, Q(fo), of the spectral window corresponding to a specific 
quadratic function, it suffices to consider the special case 2P(f) = 
S(f — /o), for which ave {X t Xi +q \ = cos 2irqf , when the average value 
of the quadratic function for such a special set of X t is exactly Q(fo)- 
If ave \X t X l+ g} = cos 2irqf, we easily find that 



ave 



[a + 1 + 1 5 Xh ) C + rf + 1 5 X ") 



11 

2 — i j ; , £ «o« 2irf(g — h) 



o+b+l^c+rf+1" 
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1 v^ dif (c + d + 1)/ / ,, , / j \ ,\ 

= — , - , 1 2^ ■ J.. , cos (-2rfh + (d - cM) 

a + o -+- 1 -o ail / 

dif (a + 6 + 1)/ dif (c + d + 1)/ ,, , n , 

= — ff-7 ^T7 cos (d - c - a + b)Tf 

dif / dif / 

« dif (a + 6 + 1)/ dif (c + d + 1)/ cos (d - c - a + 6)tt/ 

any of these expressions being the spectral window corresponding to 

Making the same assumption, we find that (where n = 21 + \) 

(±kx)(± g x)) 

= HJlgh cos 2wf(g - h) 



ave 



—t —e 



V- /dif nf\ (n dif nf n dif A „ , 



w 4 /dif nfY (\ dif n/ 1 dif / 



4 \ dif/ / \7r dif n/ rnr dif// ' 

These expressions therefore represent the spectral windows correspond- 
ing to 



t \ 2 

E hX h ) . 
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Glossary of Terms 

Add-and-subtracl method 

A method of roughly estimating spectra based on successive additions 
by non-overlapping two's followed by a differencing. (18, B.1.8 and 
B.28.) 

Alias 

In equally spaced data, two frequencies are aliases of one another if 
sinusoids of the corresponding frequencies cannot be distinguished by 
their equally spaced values (this occurs when f x = 2kf N ± / 2 for integer 
k) ; the principal aliases lie in the interval —f N ^ / ^ /#. (See also 14.) 
(Also aliased, aliasing, etc.) 

Aliased spectrum 

See Spectrum, aliased. 

Analysis, pilot 

Any of a number of methods of obtaining a rough spectrum, including 
the add-and-subtract method (18, etc.) the cascade method (B.18), the 
complete add-and-subtract method (B.18). 

Autocorrelation function 

The normalized autocovariance function (normalized so that its value 
for lag zero is unity). 

Autocovariance function 

The covariance between X(t) and X(t + r) as a function of the lag t. 
If averages of X(t) and X(t -\- r) are zero, it is equal to the average value 
of X(t) -X(t + t). It can be defined for a whole ensemble, a whole func- 
tion stretching from — <» to + « , or for a finite piece of a function; in 
the latter case it is called the apparent autocovariance function (see 4). 
Certain related functions are called modified apparent autocovariance 
functions (also see 4). 

Autoregressive series 

A time series generated from another time series as the solution of a 



270 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1958 

linear difference equation. (Usually where previous values of output se- 
ries enter into determination of current value.) 

Average 

The arithmetic mean, usually over an ensemble, a population, or some 
reasonable facsimile thereof. 

Band-limited function 

Strictly, a function whose Fourier transform vanishes outside some 
finite interval (and hence is an entire function of exponential type); 
practically, a function whose Fourier transform is very small outside 
some finite interval. 

Box-car function 

A function zero except over a finite interval, in the interior of which it 
takes a constant value (often -f-1)- 

Cardinal theorem (of interpolation theory) 

A precise statement of the conditions under which values given at a 
doubly infinite set of equally spaced points can be interpolated (with the 
aid of the function (sin (x - Xi))/(x - x { ) to yield a continuous band- 
limited function. (See B.l.) 

Cascade process (of spectral estimation) 

A process of spectral estimation in which a single step is repeated 
again and again, each step yielding both certain estimates and a con- 
densed set of data (ready for input to the next step). (See B.18.) 

Chi-square 

A quantity distributed (strictly exactly, but practically approxi- 
mately) as x* + X2 + • • • + xl 2 where Xt , x 2 , ■ ■ ■ , x k are independent 
and Gaussian, and have average zero and variance unity. 

Continuous -power spectrum 

A power spectrum representable by the indefinite integral of a suit- 
able (spectral density) function. (All power spectra of physical systems 
are continuous.) 
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Convolution 

The operation on one side of a Fourier transformation corresponding 
to multiplication on the other side. (See A. 3 for detailed discussion.) 

Cosine transform 

A series (see 13) or integral (see 2) transform in which a cosine of 
the product of the variables is the kernel. 

Covariance 

A measure of (linear) common variation between two quantities, equal 
to the average product of deviations from averages. (See 1.) 

Cross-spectrum 

The expression of the mutual frequency properties of two series analo- 
gous to the spectrum of a single series. (Because mutual relations at a 
single frequency can be in phase, in quadrature, or in any mixture of 
these, either a single complex-valued cross-spectrum or a pair of real- 
valued cross-spectra are required.) (Also cross-spectral.) 

Data 

As specifically used in this paper, values given at equally spaced 
intervals of time (often called time series). 

Data window 

A time function which vanishes outside a given interval and which is 
regarded as multiplying data or signals defined for a more extended 
period. (Data windows are usually smooth (graded) to improve the qual- 
ity of later frequency analysis.) 

Degrees of freedom 

As applied to chi-square distributions arising from quadratic forms in 
Gaussian (normal) variables, the number of linearly independent squared 
terms of equal size into which the form can be divided. In general, a 
measure of stability equal to twice the square of the average divided by 
the variance. 

Delta-component 
A finite contribution to the spectrum at one frequency (B.10 only). 
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Diffraction function 

... sin 7t.t 

dii x = . 

irx 

Dirac comb 

An array of equally spaced Dirac functions, usually most of which 
are of equal height. 

Dirac function 

The limit of functions of unit integral concentrated in smaller and 
smaller intervals near zero. (See A.2 for fuller discussion.) 

Distortion 

Failure of output to match input, (Often specified as to kind of failure 
as linear, amplitude, phase, non-linear, etc., cp. A.3.) 

Effective record length 

Actual length of record available reduced to allow for end effects. 
(See 6.) 

Elementary frequency band 

An interval of frequency conveniently thought of as containing "a 
single degree of freedom", equal to the reciprocal of twice the duration 
of observation or record. (Since both sines and cosines may occur, it 
requires two elementary frequency bands to contain "an independently 
observable frequency.") 

Ensemble 

A family of functions (here functions of either continuous or equi- 
spaced time) with probabilities assigned to relevant sub-families. 

Equivalent number (of degrees of freedom) 
See second sentence under degrees of freedom. 

Equivalent width 

The extent of a function regarded as a window as expressed by the 
ratio of the square of its integral to the integral of its square. (See 8.) 

Filtered spectrum 

Spectrum of the output from any process which can be regarded as a 
filter. 
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Folding frequency 

The lowest frequency which "is its own alias", that is, is the limit of 
both a sequence of frequencies and of the sequence of their aliases, given 
by the reciprocal of twice the time-spacing between values, also called 
Nyquist frequency. 

Fourier transform 

Operations making functions out of functions by integration against a 
kernel of the form exponential function of \/— 1 times frequency times 
time. Often, including here, defined differently for transforming time 
functions into frequency functions than for transforming frequency 
functions into time functions. (See A.l for details.) 

Frequency 

A measure of rate of repetition; unless otherwise specified, the num- 
ber of cycles per second. The angular frequency is measured in radians 
per second, and is, consequently, larger by a factor of 2ir. 

Frequency window 
The Fourier transform of a data window. 

Gaussian 

A single quantity, or a finite number of quantities distributed accord- 
ing to a probability density representable as e to the power minus a 
quadratic form. (Also called normal, Ma.viccUian, etc.) Also, a function 
or ensemble, distributed in such a way that all finite sections are Gaus- 
sian. (See 1.) 

Hamming 

The operation of smoothing with weights 0.23, 0.54 and 0.23. (After 
R. W. Hamming.) 

Manning 

The operation of smoothing with weights 0.25, 0.50 and 0.25. (After 
Julius von Hann.) 

Hyper directive antenna 

An antenna or antenna system so energized as to have a more compact 
directional pattern than naturally corresponds to its extent (as measured 
in wavelengths). 
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Impulse response 

The time function describing a linear system in terms of the output 
resulting from an input described by a Dirac function. 

Independence (statistical, of estimates) 

In general, two quantities are statistically independent if they possess 
a joint distribution such that (incomplete or complete) knowledge of 
one does not alter the distribution of the other. Estimates are statis- 
tically independent if this property holds for each fixed true situation. 

Independent phases 

An ensemble has independent phases when it can be approximated by 
ensembles consisting of finite sums of (phased) cosines (of fixed fre- 
quencies) whose phases are mutually independent. Continuous spectrum 
and independent phases imply Gaussian character. Every Gaussian 
ensemble has independent phases. 

Intermodulation distortion 

Non-linear distortion, especially as recognized in the output of a 
system when two or more frequencies enter the input simultaneously. 

Joint probability distribution 

Expression of the probability of simultaneous occurrence of values of 
two or more quantities. 

Lag 

A difference in time (epoch) of two events or values considered to- 
gether. 

Lag urindow 

A function of lag, vanishing outside a finite interval, and either mul- 
tiplying or regarded as multiplying the quantities of a family of quantities 
with differing lags. 

Lagged product 

The product of two values corresponding to different times. (In a 
mean lagged product the lags are usually all the same.) 

Lead 

The negative of lag. 
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Line (in a power spectrum) 

Theoretically, and as used in this paper, a finite contribution asso- 
ciated with a single frequency. Physically, not used here, a finite con- 
tribution associated with a very narrow spectral region. 

Lobe 

A bulge, positive or negative, especially in a spectral window. (In most 
spectral windows, a large central main lobe is surrounded on both sides 
by smaller side lobes.) 

Mean lagged product 
The (arithmetic) mean of products of equally lagged quantities. 

Moving linear combination 

A transformation expressing the values of an output time series as 
linear combinations of values of the input series in specified relations of 
lag (or lead). 

Negative frequencies 

When sines and cosines are jointly represented by two imaginary ex- 
ponentials, one has a positive frequency and the other a negative fre- 
quency. (Not specifiable for a single time function in real terms.) 

Network (linear) 

In this account, an otherwise unspecified physical device which con- 
verts an input function (of continuous time) linearly into an output func- 
tion (of continuous time). 

Noise 

In general, an undesired time-function, or component of a function. 

Non-normality 

Failure to follow a normal or Gaussian distribution. 

Normality 
The property of following a normal or Gaussian distribution. 

Nyquist frequency 

The lowest frequency coinciding with one of its own aliases, the re- 
ciprocal of twice the time interval between values (same as folding fre- 
quency). 
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Octave 
An interval of frequencies, the highest of which is double the lowest. 

Pilot (analysis or estimation) 

A process yielding rough estimates of spectral density intended mainly 
as a basis for planning more complete and precise analyses. 

Population 

A collection of objects (in particular, of numbers or of functions), with 
probabilities attached to relevant subcollections. 

Power transfer function 

The function expressing the ratio of output power near a given fre- 
quency to the input power near that frequency. 

Power-variance spectrum 

A function of frequency, in terms of which the variances and covari- 
ances of a family of spectral estimates can be expressed in standard 
form. (See 6 and 14 for details in the continuous and equi-spaced 
cases, respectively.) 

Preemphasis 

Emphasis of certain frequencies (in comparison with others), before 
processing, as an aid to the quality of result. 

Prewhitening 

Preemphasis designed to make the spectral density more nearly con- 
stant (the spectrum more nearly flat). 

Principal alias 

An alias falling between zero and plus or minus the folding or Ny- 
quist frequency. 

Process (random or stochastic) 

An ensemble of functions. (Often composed of functions of time re- 
garded as unfolding or developing.) 

Protection ratio 

The ratio of transmission at a desired frequency to the transmission 
at an undesired alias of that frequency. 
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Recording 

Is spaced when originally taken at equal intervals, mixed when taken 
continuously and processed at equal intervals, continuous when taken 
and processed on a continuous basis. 

Resolution 

A measure of the concentration of a spectral estimate expressed in 
frequency units, here taken (for the important cases) as equal to the 
width of the major lobe. (See B.23.) 

Resolved bands {number of) 

The ratio of the Nyquist or folding frequency to the resolution. 

Sampling theorem (of information theory) 

Nyquist's result that equi-spaced data, with two or more point-* per cycle 
of highest frequency, allows reconstruction of band-limited functions. 
(See Cardinal theorem.) 

Serial correlation coefficients 

Ratios of the autocovariances to the variance of a process, ensemble, 
etc. 

Signal 
A time function desired as (potentially) carrying intelligence. 

"Signal" 

A function of continuous time, which may be either a signal, a noise, 
or a combination of both. (Contrasted with data, a function of discrete 
time.) 

Single function approach 

A mode of representing certain ensembles by the translations of a 
single time function (in single function terms). 

Smoothed function 

The result of weighted averaging of nearby values of the original 
function. 

Smoothing 

In the narrow sense, forming (continuous or discrete) moving linear 
combinations with unit total weight. 
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Smoothing and decimation procedure 

A procedure which may be regarded as the formation of discrete mov- 
ing linear combinations, followed by the omission of all but every fcth 
such. (See 17 and B.17.) 

Spectrum (also power spectrum) 

An expression of the second moments of an ensemble, process, etc. (i) 
in terms of frequencies, (ii) in such a form as to diagonalize the effects 
on second moments of time-invariant linear transformations applied to 
the ensemble or process, (adjective: spectral). 

Spectrum, aliased 

For equally spaced data, the principal part of the aliased spectrum 
expresses contributions to the variance in terms of frequencies between 
zero and the Nyquist or folding frequency, all contributions from fre- 
quencies having the same principal alias and sign having been combined 
by addition. (The aliased spectrum repeats the principal part periodically 
with period 2f N . See 14.) 

Spectral density 

A value of a function (or the entire function) whose integral over any 
frequency interval represents the contribution to the variance from that 
frequency interval. 

Spectral density estimates 

Estimates of spectral density, termed raw when obtained from equi- 
spaced mean lagged products by cosine series transformation, refined 
when hanned or hammed from raw estimates or obtained by an equiva- 
lent process. (See B.13.) 

Spectral window 

A function of frequency expressing the contribution of the spectral 
density at each frequency to the average value of an estimate of 
(smoothed) spectral density. 

Stationary (ensemble or random process) 

An ensemble of time functions (or random process) is stationary if 
any translation of the time origin leaves its statistical properties un- 
affected. 
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Superposition theorem 

A statement that the output of a linear device is the convolution of its 
input with its impulse response. (See B.2.) 

Temporally homogeneous 

Sometimes used in place of stationary, especially when speaking of 
stochastic processes. 

Transfer function 

The transfer function of a network or other linear device is a complex- 
valued function expressing the amplitude and phase changes suffered by 
cosinusoidal inputs in becoming outputs. (See A.5.) The square of the 
absolute value of the transfer function is the power transfer function, 
which expresses the factors by which spectral densities are changed as 
inputs become outputs. (See 4.) 

Transmission 

The coefficient with which power at a given frequency contributes to 
power at the (new) principal alias as a result of the application of a 
smoothing and decimation procedure. 

Transversal filtering 

Time domain filtering by forming linear combinations of lagged values, 
use of moving linear combinations for filtering. (See Kallmann for the 
origin of this term.) 

Trend 

A systematic, smooth component of a time function (time series), as, 
for example, a linear function of time (a linear trend). 

True 

Often used to refer to average values over the ensemble, as contrasted 
with mean values over the observations. 

Universe 

A collection of objects (numbers, functions, etc.) with probabilities 
attached to relevant subcollections. 

Variance 

A quadratic measure of variability, the average squared deviation 
from the average. 
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White noise 

An ensemble whose spectral density is sensibly constant from zero 
frequency through the frequencies of interest (in equi-spaced situations, 
up to the folding or Nyquist frequency). (The values of equi-spaced 
white noise at different times are independent.) 

Window 

A function expressing, as a multiplicative factor, the tendency or 
possibility of the various values of some function to enter into some 
calculation or contribute to the average value of some quantity. (See 
data, lag, spectral, etc. for specific instances.) 

Windowless quadratic 

A quadratic expression is windowless if its average value vanishes 
for every stationary ensemble of finite variance (See B.19). 

Window pair- 
Two windows related by a Fourier transformation, as lag and spectral 
windows or data and frequency windows. (See A.4 and 4.) 

Zero-frequency waves (cosine and sine) 

The limiting forms of very-low-frequency cosinusoids, namely con- 
stants and linear trends. (See 19.) 
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